3
#!/bin/bash

cd /path-to-directory
md5=$(find . -type f -exec md5sum {} \; | sort -k 2 | md5sum)

zenity --info \
--title= "Calculated checksum" \
--text= "$md5"

The process of a recursive checksum calculation for a directory takes a while. The bash script doesn´t wait until the process is finished it just moves to the next command which is the dialog box, that displays the calculated checksum. So the dialog box shows a wrong checksum.

Is there an option to tell the script to wait until the calculation of the checksum is finished? Furthermore, is there an option to pipe the progress of the checksum calculation to some kind of progress bar like in zenity for example?

9
  • Please review that code. It's invalid syntax as written. Commented Oct 25, 2022 at 13:22
  • Thanks. I corrected it. Commented Oct 25, 2022 at 13:29
  • 2
    What makes you think it doesn't wait? Are you sure that's the problem or is it just that you see no text in the zenity box and guess it is because the md5sum hasn't finished? Commented Oct 25, 2022 at 13:38
  • 1
    Are you sure a space is allowed after the = in the zenity command? It is probably still invalid code. Commented Oct 25, 2022 at 13:41
  • 2
    Please add all this to your question so we don't waste your time or ours with wrong data. You now say it shows a wrong checksum? How is it wrong? How are you testing? If you don't explain what you are doing we won't be able to help you. Commented Oct 25, 2022 at 14:07

2 Answers 2

6

As written, it would be waited for. For a pulsating progress bar:

#! /bin/sh -
export LC_ALL=C
cd /path/to/dir || exit
{
  md5=$(
    find . -type f -print0 |
      sort -z |
      xargs -r0 md5sum |
      md5sum
  )
  exec >&-
  zenity --info \
         --title="Checksum" \
         --text="$md5"
} | zenity --progress \
           --auto-close \
           --auto-kill \
           --pulsate \
           --title="${0##*/}" \
           --text="Computing checksum"

For an actual progress bar, you'd need to know the number of files to process in advance.

With zsh:

#! /bin/zsh -
export LC_ALL=C
autoload zargs
cd /path/to/dir || exit
{
  files=(.//**/*(ND.))
} > >(
  zenity --progress \
         --auto-close \
         --auto-kill \
         --pulsate \
         --title=$0:t \
         --text="Finding files"
)
md5=(
  $(
   zargs $files -- md5sum \
      > >(
        awk -v total=$#files '/\/\// {print ++n * 100 / total}' | {
          zenity --progress \
            --auto-close \
            --title=$0:t \
            --text="Computing checksum" || kill -s PIPE $$
        }) \
      | md5sum
  )
)
zenity --info \
       --title=$0:t \
       --text="MD5 sum: $md5[1]"

Note that outside of the C locale, on GNU systems at least, filename order is not deterministic, as some characters sort the same and also filenames are not guaranteed to be made of valid text, hence the LC_ALL=C above.

The C locale order is also very simple (based on byte value) and consistent from system to system and version to version.

Beware that means that error messages if any will be displayed in English instead of the user's language (but then again the Computing checksum, Finding files, etc are not localised either so it's just as well).

Some other improvements over your approach:

  • Using -exec md5sum {} + or -print0 | xargs -r0 md5sum (or zargs equivalent) minimises the number of md5sum invocations, each md5sum invocation being passed a number of files. -exec md5sum {} \; means running one md5sum per file which is very inefficient.
  • we sort the list of files before passing to md5sum. Doing sort -k2 in general doesn't work as file names can contain newline characters. In general, it's wrong to process file paths line-based. You'll notice we use a .// prefix in the zsh approach for awk to be able to count files reliable. Some md5sum implementations also have a -z option for NUL-delimited records.
3
  • Is it possible to use the variable $md5 (from the first example) globally? When I try to use it in a zenity dialog outside of the {} brackets it has no value. Commented Nov 7, 2022 at 13:33
  • @stewie, no that {...} command group runs in a subshell Commented Nov 7, 2022 at 13:47
  • Sad, thank you anyways. Commented Nov 7, 2022 at 13:55
3

That code will wait for the find and md5sum commands to finish. That's just the normal behavior, unless you have a & to send the commands to the background.

However, your zenity command is malformed: you can't have a space after the =. So I am guessing that you are seeing an empty zenity window and that's why you think it isn't waiting. Try again, but remove the spaces:

#!/bin/bash

cd /path-to-directory
md5=$(find . -type f -exec md5sum {} \; | sort -k 2 | md5sum)

zenity --info \
--title="Calculated checksum" \
--text="$md5"

You can also avoid the need to cd and make it a bit more concise if you do:

#!/bin/bash

zenity --info \
--title="Calculated checksum" \
--text="$(find /path-to-directory -type f -exec md5sum {} \; | sort -k 2 | md5sum)"

If you need this to always return the same result, no matter what the parent path of the directory is, you can use this to remove the file names from the output of the find ... md5sum command before passing to the second md5sum:

#!/bin/bash

zenity --info \
--title="Calculated checksum" \
--text="$(find /path-to-directory -type f -exec md5sum {} \; | cut -d ' ' -f1 | sort -k 2 | md5sum)"
8
  • This is just a mistake I made in this forum because I didn´t copy the original script. It is on a different machine. However if I run the script like this, a zenity dialog pops up immediately and shows a checksum, that isn´t the correct one. I compared it to the output of a terminal, that runs the "find . -type ... " code. Commented Oct 25, 2022 at 13:55
  • 4
    @stewie that isn't really possible. We need to see the exact commands you are running to be able to help though. Commented Oct 25, 2022 at 14:12
  • Remember that the command is piped to a second md5sum. So your version runs on "[MD5 hash] /path-to-directory/path/to/dirname" while the original does "[MD5 hash] path/to/dirname". Try it and see. You get different results with and without the cd. The test as @stewie described would give different results. Also, the correct comparison would be to find /path-to-directory -type f -exec md5sum {} \; | sort -k 2 | md5sum or find . -type f -exec md5sum {} \; | sort -k 2 | md5sum. Doing the cd gives consistent results regardless of /path-to-directory. Commented Oct 26, 2022 at 0:12
  • You are right I tried both version on /run/media/"$USER"/directory and got different results. So if I want the correct checksum for the directory on my mounted device I need to use the command find /run/media/"$USER"/directory -type f -exec md5sum {} \; | sort -k 2 | md5sum , right? Commented Oct 27, 2022 at 9:50
  • @stewie it depends on what you mean by "correct". The md5sum you are calculating is including the path. If you then want to compare this in order to check that the same files are present (which was never mentioned in the question), then you do indeed need to ignore the paths. So either cd into the directory first, or remove the paths from the output of -exec md5sum {} like this: find foo/ -type f -exec md5sum {} \; | cut -d ' ' -f1. See updated answer. Commented Oct 27, 2022 at 10:31

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.