0

I need a shell script that will create a loop to start parallel tasks read in from a file...

Something in the lines of..

#!/bin/bash
mylist=/home/mylist.txt
for i in ('ls $mylist')
do
do something like cp -rp $i /destination &
end
wait

So what I am trying to do is send a bunch of tasks in the background with the "&" for each line in $mylist and wait for them to finish before existing.

However, there may be a lot of lines in there so I want to control how many parallel background processes get started; want to be able to max it at say.. 5? 10?

Any ideas?

Thank you

5
  • 1
    sounds like you want GNU parallel Commented Jan 14, 2015 at 2:21
  • Anyway to do this without adding any additional utilities? I know about parallel but I am unable to get the admin to install it. Commented Jan 14, 2015 at 2:27
  • possible duplicate of Parallelize Bash Script with maximum number of processes Commented Jan 14, 2015 at 2:53
  • Unless you're copying to different devices, you're unlikely to get much of a speed boost copying files, and if you have more than a couple, it's likely to be slower. Commented Jan 14, 2015 at 17:16
  • @exxoid What makes you think that you need your admin to install GNU Parallel? Commented Jan 14, 2015 at 22:34

2 Answers 2

3

Your task manager will make it seem like you can run many parallel jobs. How many you can actually run to obtain maximum efficiency depends on your processor. Overall you don't have to worry about starting too many processes because your system will do that for you. If you want to limit them anyway because the number could get absurdly high you could use something like this (provided you execute a cp command every time):

...
while ...; do
    jobs=$(pgrep 'cp' | wc -l)
    [[ $jobs -gt 50 ]] && (sleep 100 ; continue)
    ...
done

The number of running cp commands will be stored in the jobs variable and before starting a new iteration it will check if there are too many already. Note that we jump to a new iteration so you'd have to keep track of how many commands you already executed. Alternatively you could use wait.

Edit: On a side note, you can assign a specific CPU core to a process using taskset, it may come in handy when you have fewer more complex commands.

Sign up to request clarification or add additional context in comments.

Comments

3

You are probably looking for something like this using GNU Parallel:

parallel -j10 cp -rp {} /destination :::: /home/mylist.txt

GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.

If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:

Simple scheduling

GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling

Installation

If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:

(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash

For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README

Learn more

See more examples: http://www.gnu.org/software/parallel/man.html

Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html

Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

3 Comments

Won't your scheduler do this for you anyway?
If you mean the scheduler in the kernel, then the answer is no.
Oh, good to know, I didn't know that. Thanks for the info and the suggestion of the parallel command!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.