1

I have a number of folders with my various media (e.g. photos, music) from different points in time. The different folders have some of the same content (e.g. a photo might be in 2 folders), but should be mostly unique. There are no guarantees on the filename in different folders - e.g. a photo might be present as A/foo.png and B/bar.png. Alternatively, A/baz.png and B/baz.png might not be the same file.

I'm looking for some way to consolidate all of the media into a single, flat folder, with duplicates removed. Ideally, some tracking of where the files originally came from would be nice (e.g. knowing that output/001.png came from A/baz.png, etc), but this isn't strictly necessary. There are a lot (1M+ files), so the faster the better :).

I originally tried to just copy all of the files from the folders into a new folder, but this took a long time, and would only deduplicate if the filenames are identical, which isn't true in this case. I think there might be some way to get this command to go faster with xargs -P but I wasn't sure how.

find . -type f -exec cp {} \;

A two stage system or similar is fine - e.g. first flatten and rename all of the files into a new folder so that they all have unique filenames, and then filter out duplicates. I have the storage space to do that, I'm just not sure how to do it.

3
  • 1
    Doing it the other way around word probably be better, use fdupes to find and delete the duplicates, then move everything to a single directory while taking care of filename collisions. Commented May 27, 2020 at 7:03
  • Or jdupes, it seems faster than fdupes. Commented May 27, 2020 at 7:06
  • Thanks for the suggestion. I've made a copy of all of my data and am working through deduplicating it now with jdupes: jdupes copy -Z -r -d. Might rerun with jdupes copy -Z -r -d -N if it turns out there's too many duplicates to go through by hand. Do you have suggestions for how to copy / rename all of the files to a new folder? I could write a quick python script to do it but maybe there's a better / faster option. Commented May 27, 2020 at 7:29

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.