So, I have got two directories personal_files and personal_files_oldcopy
Through some file processing, I am not sure if both directories contain the same structure, or additional / missing files are present. Assume that I don't much care about the actual content of the file, just their presence, and thus the potential difference in their file trees.
The directories each have a size of approximately 2TB, so an ordinary diff -r is not viable.
How to quickly, transparently and easily compare the two directories including their structures and their presence of named files?
Ideally, I want to know, which files and directories are included in one but not the other. Bonus points if by some checksumming magic or file size comparision, I can get a report about superficial difference of files that have the same location and name.
Example:
personal_files/
├── docs/
│ ├── resume.pdf
│ └── cover_letter.docx
├── music/
│ ├── rock/
│ │ └── song1.mp3
│ └── jazz/
│ └── smooth.mp3
└── notes.txt (Content: "I am a note")
personal_files_oldcopy/
├── docs/
│ ├── resume.pdf
│ ├── cover_letter.docx
│ └── old_portfolio.pdf
├── music/
│ ├── rock/
│ │ └── song1.mp3
│ └── pop/
│ └── hit_single.mp3
└── notes.txt (Content: "This is another different note")
Anyone wishing to recreate that directory structure to test with can do so by executing this script:
for dir in personal_files personal_files_oldcopy; do
mkdir -p "$dir/docs"
echo 'foo' > "$dir/docs/resume.pdf"
echo 'foo' > "$dir/docs/cover_letter.docx"
mkdir -p "$dir/music/rock"
echo 'foo' > "$dir/music/rock/song1.mp3"
done
dir='personal_files'
mkdir -p "$dir/music/jazz"
echo 'foo' > "$dir/music/jazz/smooth.mp3"
echo 'I am a note' > "$dir/notes.txt"
dir='personal_files_oldcopy'
echo 'foo' > "$dir"/docs/old_portfolio.pdf
mkdir -p "$dir/music/pop"
echo 'foo' > "$dir/music/pop/hit_single.mp3"
echo 'This is another different note' > "$dir/notes.txt"
Output should be something like (obviously presentation might differ)
Only in personal_files_oldcopy/docs: old_portfolio.pdf
Only in personal_files/music/jazz: smooth.mp3
Only in personal_files_oldcopy/music/pop: hit_single.mp3
// optional:
Present in both, possible content different: notes.txt (insert optional difference detection, checksum or size?)
// could be hidden:
Present in both: docs/resume.pdf
Present in both: docs/cover_letter.docx
Present in both: music/rock/song1.mp3
Potentially this could fold down to directory level if a whole directory is missing, instead of listing all the files within, but that would go more into a script territory, right?
findseparately on both directories to get a list of everything you want to compare, thensortthese outputs, so you can then compare both lists withdiff.vimdiff <(du -h personal_files) <(du -h personal_files_oldcopy)rsyncalso provides some amazing directory/file comparison capabilities. You can use the-n(--dry-run) option to get a detailed listing of the differences in files between source and destination, based on size and modification time, not contents. (e.g.rsync -uavn <src> <dest>)