Is there a way to parallelize a bash for loop?

Question

I have a simple script that pulls the SMART data from a series of hard drives and writes it to a timestamped log file which is later logged and parsed for relevant data.

filename="filename$( date '+%Y_%m_%d_%H%M' ).txt"
for i in {a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p}
do
smartctl -a /dev/sd$i >> /path/to/location/$filename
done

Since this takes several seconds to run, I would like to find a way to parallelize it. I've tried just appending an '&' to the end of the single line in the loop, however that causes the text file to be written haphazardly as sections finish rather than sequentially and in a readable manner. Is there a way to fork this into seperate processes for each drive then pipe the output back into an orderly text file?

Also, I assume setting the filename variable will have to be moved into the for loop in order for the forks to be able to access it. That causes an issue however if the script runs long enough to roll over into a new minute (or two) and then the script becomes sequentially datestamped fragments rather than one contiguous file.

I assume you can start a new bash and put it in the background? Oh I see, the prob is in the resulting file being garbled. Catch the results in an array or differently named vars, wait for all children and copy the vars in the desired order when all are done. — Peter - Reinstate Monica
– Peter - Reinstate Monica, Commented Sep 30, 2015 at 5:24
Im not sure what you mean.... The script runs automatically upon a successful network connection to archive the created log file on a remote ftp server. — k k
– k k, Commented Sep 30, 2015 at 5:27
I tried that as well (allowing each forked process to spawn its own temp file then concatenating them all into one master before archiving) however that actually runs slower than simply executing the for loop as written one drive at a time. — k k
– k k, Commented Sep 30, 2015 at 5:37
What's the point of {a,b,c,...} when a b c ... is both shorter and 100% portable? Don't write bash scripts. Write shell scripts. — Jens
– Jens, Commented Sep 30, 2015 at 9:52
@Jens Yes, Im sure there will always be a way to write a shorter script, however the question was about a for loop in bash because its part of a larger script thats already written in ... wait for it... bash. :D — k k
– k k, Commented Oct 1, 2015 at 1:14

Mark Setchell · Accepted Answer · 2015-09-30 14:49:53Z

2

With GNU Parallel like this:

parallel -k 'smartctl -a /dev/{}' ::: a b c d e f g h  i j k l m n o p > path/to/output

The -k option keeps the output in order. Add -j 8 if you want to run, say, 8 at a time, else it will one per core at a time. Or -j 16 if you want to run them all at once...

parallel -j 16 -k 'smartctl ....

Of course, if you are in bash you can do this too:

parallel -j 16 -k 'smartctl -a /dev/{}' ::: {a..o} > path/to/output

edited Sep 30, 2015 at 14:49

answered Sep 30, 2015 at 8:30

Mark Setchell

210k32 gold badges310 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

k k Over a year ago

This is what I was hoping for. Only change I made was /dev/{} became /dev/sd{} to reflect the input smartctl was looking for, and I had to install parallel since for some reason it wasnt included in Ubuntu 14.04. Parallelizing the smartctl instances cut down the scripts exectution time dramatically.

Mark Setchell Over a year ago

Excellent! Sorry I missed the sd part of /dev. Glad it worked out for you.

Gilles · Accepted Answer · 2015-09-30 06:12:11Z

1

Wouldn't something like this work? (not tested)

filename="filename$( date '+%Y_%m_%d_%H%M' ).txt"
for i in {a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p}
do
smartctl -a /dev/sd$i > /path/to/location/$filename.$i &
done
wait
cat /path/to/location/$filename.* > /path/to/location/$filename

EDIT: it looks like the final cat is slow, so what about this version?

filename="filename$( date '+%Y_%m_%d_%H%M' ).txt"
tmpdir="/dev/shm/tmp$( date '+%Y_%m_%d_%H%M' )"
mkdir $tmpdir
for i in {a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p}
do
smartctl -a /dev/sd$i > $tmpdir/$filename.$i &
done
wait
cat $tmpdir/$filename.* > /path/to/location/$filename
rm -rf $tmpdir

edited Sep 30, 2015 at 6:12

answered Sep 30, 2015 at 5:58

Gilles

9,5694 gold badges39 silver badges55 bronze badges

6 Comments

k k Over a year ago

Thank you, that is what was suggested earlier however that seems to actually run slower than the original for loop. Im hoping for a way to parallelize it without having to create then cat multiple files.

k k Over a year ago

Forgive if the answer to this is obvious, but is it substantially different to use tempdir to store the forked output before concatenting it all back together vs storing them locally before concatenating them the same way? The system this is running on does mount the tmp directory in RAM so theoretically that would be incrimentally faster than a spinning platter HDD, but its using an SSD so I cant imagine there is more than a few milliseconds difference?

Gilles Over a year ago

IDK... I was just trying to make the final cat faster... But actually, if I read you well, that's the multiplicity of concurrent occurrences of smartctl that slows down the whole process... So maybe running them in parallel is a bad idea in the first place. Maybe you can try by hand to evaluate how many you can run in parallel without loosing in performance, and stick to this number by a double loop on your script, with only the inner one parallelised...

k k Over a year ago

I dont believe the multiple occurances of smartctl are the bottleneck. Its running on an 8 core machine right around the 5ghz mark, and since theres only one operation per hdd it seems that running them in parallel would reduce the overall time drastically vs consecutive instances. Similarly, the hdds are attached in batches either via SATAIII plugs on the mobo, or on PCI-E 2.0 riser cards so I dont think there is a bandwidth bottleneck anywhere in that line.

Peter - Reinstate Monica Over a year ago

It's possible that smartctl -a is very fast. (I suppose from skimming the man page that it just requests information already available in the disk and doesn't actually trigger any tests.) It's potentially much faster than actual disk access, for example when you want to write the result file. (Although, given enough cache, writing to the disk should also be fast, unless you mounted it with -o sync and turned off the disk cache with hdparm -W0, or you have lots of other I/O going on). So it looks as if writing to disk is the bottleneck... (ct'd)

|

Collectives™ on Stack Overflow

Is there a way to parallelize a bash for loop?

2 Answers 2

2 Comments

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related