Skip to main content
added 23 characters in body; edited tags
Source Link
terdon
  • 252.7k
  • 69
  • 481
  • 719

As the title suggests, I'm looking to check a bunch of files on a Linux system, and keep only one of each hash. ForFor the files, the filename is irrelevant, the only important part is the hash itself.

I did find this question which partly answers my question in that it finds all the duplicates.

https://superuser.com/questions/487810/find-all-duplicate-files-by-md5-hash

The above linked question has this as an answer.

find . -type f -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

find . -type f -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

Any ideas/suggestions as to add deleting to this answer?

I guess I could use something like php/python to parse the output and split the files into groups by the blank line, then skip the first entry in each group if the file exists, and then delete the rest if they exist.

As the title suggests, I'm looking to check a bunch of files and keep only one of each hash. For the files the filename is irrelevant, the only important part is the hash itself.

I did find this question which partly answers my question in that it finds all the duplicates.

https://superuser.com/questions/487810/find-all-duplicate-files-by-md5-hash

The above linked question has this as an answer.

find . -type f -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

Any ideas/suggestions as to add deleting to this answer?

I guess I could use something like php/python to parse the output and split the files into groups by the blank line, then skip the first entry in each group if the file exists, and then delete the rest if they exist.

As the title suggests, I'm looking to check a bunch of files on a Linux system, and keep only one of each hash. For the files, the filename is irrelevant, the only important part is the hash itself.

I did find this question which partly answers my question in that it finds all the duplicates.

https://superuser.com/questions/487810/find-all-duplicate-files-by-md5-hash

The above linked question has this as an answer.

find . -type f -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

Any ideas/suggestions as to add deleting to this answer?

I guess I could use something like php/python to parse the output and split the files into groups by the blank line, then skip the first entry in each group if the file exists, and then delete the rest if they exist.

Source Link
AeroMaxx
  • 227
  • 1
  • 5
  • 12

Find and delete all duplicate files by hash

As the title suggests, I'm looking to check a bunch of files and keep only one of each hash. For the files the filename is irrelevant, the only important part is the hash itself.

I did find this question which partly answers my question in that it finds all the duplicates.

https://superuser.com/questions/487810/find-all-duplicate-files-by-md5-hash

The above linked question has this as an answer.

find . -type f -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

Any ideas/suggestions as to add deleting to this answer?

I guess I could use something like php/python to parse the output and split the files into groups by the blank line, then skip the first entry in each group if the file exists, and then delete the rest if they exist.