3

I have a simple script that pulls the SMART data from a series of hard drives and writes it to a timestamped log file which is later logged and parsed for relevant data.

filename="filename$( date '+%Y_%m_%d_%H%M' ).txt"
for i in {a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p}
do
smartctl -a /dev/sd$i >> /path/to/location/$filename
done 

Since this takes several seconds to run, I would like to find a way to parallelize it. I've tried just appending an '&' to the end of the single line in the loop, however that causes the text file to be written haphazardly as sections finish rather than sequentially and in a readable manner. Is there a way to fork this into seperate processes for each drive then pipe the output back into an orderly text file?

Also, I assume setting the filename variable will have to be moved into the for loop in order for the forks to be able to access it. That causes an issue however if the script runs long enough to roll over into a new minute (or two) and then the script becomes sequentially datestamped fragments rather than one contiguous file.

5
  • I assume you can start a new bash and put it in the background? Oh I see, the prob is in the resulting file being garbled. Catch the results in an array or differently named vars, wait for all children and copy the vars in the desired order when all are done. Commented Sep 30, 2015 at 5:24
  • Im not sure what you mean.... The script runs automatically upon a successful network connection to archive the created log file on a remote ftp server. Commented Sep 30, 2015 at 5:27
  • I tried that as well (allowing each forked process to spawn its own temp file then concatenating them all into one master before archiving) however that actually runs slower than simply executing the for loop as written one drive at a time. Commented Sep 30, 2015 at 5:37
  • What's the point of {a,b,c,...} when a b c ... is both shorter and 100% portable? Don't write bash scripts. Write shell scripts. Commented Sep 30, 2015 at 9:52
  • @Jens Yes, Im sure there will always be a way to write a shorter script, however the question was about a for loop in bash because its part of a larger script thats already written in ... wait for it... bash. :D Commented Oct 1, 2015 at 1:14

2 Answers 2

2

With GNU Parallel like this:

parallel -k 'smartctl -a /dev/{}' ::: a b c d e f g h  i j k l m n o p > path/to/output

The -k option keeps the output in order. Add -j 8 if you want to run, say, 8 at a time, else it will one per core at a time. Or -j 16 if you want to run them all at once...

parallel -j 16 -k 'smartctl ....

Of course, if you are in bash you can do this too:

parallel -j 16 -k 'smartctl -a /dev/{}' ::: {a..o} > path/to/output
Sign up to request clarification or add additional context in comments.

2 Comments

This is what I was hoping for. Only change I made was /dev/{} became /dev/sd{} to reflect the input smartctl was looking for, and I had to install parallel since for some reason it wasnt included in Ubuntu 14.04. Parallelizing the smartctl instances cut down the scripts exectution time dramatically.
Excellent! Sorry I missed the sd part of /dev. Glad it worked out for you.
1

Wouldn't something like this work? (not tested)

filename="filename$( date '+%Y_%m_%d_%H%M' ).txt"
for i in {a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p}
do
smartctl -a /dev/sd$i > /path/to/location/$filename.$i &
done
wait
cat /path/to/location/$filename.* > /path/to/location/$filename

EDIT: it looks like the final cat is slow, so what about this version?

filename="filename$( date '+%Y_%m_%d_%H%M' ).txt"
tmpdir="/dev/shm/tmp$( date '+%Y_%m_%d_%H%M' )"
mkdir $tmpdir
for i in {a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p}
do
smartctl -a /dev/sd$i > $tmpdir/$filename.$i &
done
wait
cat $tmpdir/$filename.* > /path/to/location/$filename
rm -rf $tmpdir

6 Comments

Thank you, that is what was suggested earlier however that seems to actually run slower than the original for loop. Im hoping for a way to parallelize it without having to create then cat multiple files.
Forgive if the answer to this is obvious, but is it substantially different to use tempdir to store the forked output before concatenting it all back together vs storing them locally before concatenating them the same way? The system this is running on does mount the tmp directory in RAM so theoretically that would be incrimentally faster than a spinning platter HDD, but its using an SSD so I cant imagine there is more than a few milliseconds difference?
IDK... I was just trying to make the final cat faster... But actually, if I read you well, that's the multiplicity of concurrent occurrences of smartctl that slows down the whole process... So maybe running them in parallel is a bad idea in the first place. Maybe you can try by hand to evaluate how many you can run in parallel without loosing in performance, and stick to this number by a double loop on your script, with only the inner one parallelised...
I dont believe the multiple occurances of smartctl are the bottleneck. Its running on an 8 core machine right around the 5ghz mark, and since theres only one operation per hdd it seems that running them in parallel would reduce the overall time drastically vs consecutive instances. Similarly, the hdds are attached in batches either via SATAIII plugs on the mobo, or on PCI-E 2.0 riser cards so I dont think there is a bandwidth bottleneck anywhere in that line.
It's possible that smartctl -a is very fast. (I suppose from skimming the man page that it just requests information already available in the disk and doesn't actually trigger any tests.) It's potentially much faster than actual disk access, for example when you want to write the result file. (Although, given enough cache, writing to the disk should also be fast, unless you mounted it with -o sync and turned off the disk cache with hdparm -W0, or you have lots of other I/O going on). So it looks as if writing to disk is the bottleneck... (ct'd)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.