Duplicate Content Detection with fdupes Command
- Categories:
- tutorial
When faced with the task of identifying duplicate content among a lot of files, I discovered a convenient Linux command instead of having to write custom code.
Suppose all the files in your directory have a .txt
extension in an example
directory.
To find duplicate content among .txt
files in the example
directory on a Linux system, you can use the fdupes
command below:
Explanation of the options used:
-r
: Recursively search for duplicate files in subdirectories.-n
: Display only the files that have duplicates.example/
: Target directory.
This command will list the duplicate files found in the specified directory and its subdirectories, based on their content. Make sure you review the output carefully before taking any action, as it will show you the files that are considered duplicates based on content comparison.
Please note that fdupes
considers duplicate content based on the file’s content hash, not necessarily the file name. If you’re specifically interested in finding files with the same names but different content, you might need a more advanced script or tool.
Recent Posts
How to Defend Against Brute-Force and DoS Attacks with Fail2ban, Nginx limit_req, and iptables
In this tutorial, I’ll explain how to protect your public-facing Linux server and Nginx web server from common threats, including brute-force and DoS attacks.
Is Getting AWS Solutions Architect Associate Certification Worth It?
If you are a full-time Software Engineer, there's no strong need to pursue this certification.
DevSecOps
My Notes about DevSecOps
AWS Secrets Manager
Explanation about AWS Secrets Manager with example code.
Envelope Encryption
Envelope encryption is the practice of encrypting plaintext data with a data key, and then encrypting the data key under another key.