I have an archive of photos stored in a directory tree on my Mac like:
./2016/05/17/photo-312.jpg
./2016/05/19/photo-1234.jpg
./2016/05/19/photo-5678.jpg
I want to create MD5 hashes of each file that can be used to verify the photos have not been altered or corrupted. My goals are:
- One MD5 file per photo
- Store the MD5 files in the same directory their corresponding photos
- Use the same base name as the photo, but switch the extension to
.md5 - Capture only the hash value (e.g.
b1046abbe7bbf2a2473e9489599f38e0) without any trailing spaces or newlines
For example, the above directory structure would look like this after the process runs:
./2016/05/17/photo-312.jpg
./2016/05/17/photo-312.md5
./2016/05/19/photo-1234.jpg
./2016/05/19/photo-1234.md5
./2016/05/19/photo-5678.jpg
./2016/05/19/photo-5678.md5
(Note: I only need to run this process one time. The process I use to move photos into the archive will create the necessary MD5 files for new photos from this point forward.)
Here's the one-liner I came up with:
find . -type f -name "*.jpg" -exec bash -c 'printf "%s" $(md5 -q "$0") > "${0%.*}.md5"' {} \;
(Note: my machine has md5 instead of md5sum which I often see referenced. So, I'm using that.)
Here's a few details on how I understand this to work:
The first section runs a basic
findcommand on the current directory (i.e. ".") looking for.jpgfiles and sends them to bash with-exec bash -cfind . -type f -name "*.jpg" -exec bash -cBash runs
printfto setup for a string that doesn't have a newline:printf "%s"This section generates the hash that is used to feed the string into
printf:$(md5 -q "$0")The
-qflag tellsmd5to output only the hash instead of the standard MD5 output which would look something line:MD5 (photo-312.jpg) = b1046abbe7bbf2a2473e9489599f38e0
The value of
$0is the relative path to the source.jpgfile thatfindsent to bash.This section creates the file path to store the value in where the original extension is replaced by
.md5:"${0%.*}.md5"More details about what's going on there can be found in the
${parameter%word}section of the Bash Manual.The last little bit is:
{} \;I'm not sure why, but the
{}is necessary to make this run. (My understanding is that it's a reference to the file path. I don't know how that ties in, butmd5: bash: No such file or directoryerrors pop up if it's not there.)Finally, the
\;identifies the end of find's-exec.
While I normally use other languages for this type of work, I decided to try this with bash to get some practice with it. I've done some basic testing and everything appears to work as expected. Given my infrequent use of bash, I'd like to make sure I'm not getting myself in trouble. So, my questions are:
Are there any gotchas in this code that are waiting to bite me?
Is there a more standard or efficient way to do this?
UPDATE: I modified my code based on the answers. In case it's useful, here's what I ended up with:
find . -type f \( -name '*.cr2' -or -name '*.jpg' \) -execdir sh -c 'sha1sum "{}" > "${1%.*}".sha1' -- {} \;
Which:
- Allows for multiple file extension to be processed at the same time.
- Uses
-execdirinstead of-execso the default output of the hashing algorithm don't contain paths. (Which is one reasons I was trying to strip them originally). - Instead of
md5usessh1sumwhich provides asha1sum -cflag for verifying files and didn't require installation via homebrew. - Uses the more appropriate
${1%.*}(with the help of the--at the end) instead of${0%.*}to remove the initial file extension.
md5I moved to usingsha1sumwhich seems to come installed by default on Macs running 10.12 and provides the-cfor verification. \$\endgroup\$