I have a number of folders with my various media (e.g. photos, music) from different points in time. The different folders have some of the same content (e.g. a photo might be in 2 folders), but should be mostly unique. There are no guarantees on the filename in different folders - e.g. a photo might be present as A/foo.png and B/bar.png. Alternatively, A/baz.png and B/baz.png might not be the same file.
I'm looking for some way to consolidate all of the media into a single, flat folder, with duplicates removed. Ideally, some tracking of where the files originally came from would be nice (e.g. knowing that output/001.png came from A/baz.png, etc), but this isn't strictly necessary. There are a lot (1M+ files), so the faster the better :).
I originally tried to just copy all of the files from the folders into a new folder, but this took a long time, and would only deduplicate if the filenames are identical, which isn't true in this case. I think there might be some way to get this command to go faster with xargs -P but I wasn't sure how.
find . -type f -exec cp {} \;
A two stage system or similar is fine - e.g. first flatten and rename all of the files into a new folder so that they all have unique filenames, and then filter out duplicates. I have the storage space to do that, I'm just not sure how to do it.
fdupesto find and delete the duplicates, then move everything to a single directory while taking care of filename collisions.jdupes, it seems faster thanfdupes.jdupes:jdupes copy -Z -r -d. Might rerun withjdupes copy -Z -r -d -Nif it turns out there's too many duplicates to go through by hand. Do you have suggestions for how to copy / rename all of the files to a new folder? I could write a quick python script to do it but maybe there's a better / faster option.