Run a script on multi cores and parallel processing

Question

I am writing a script which take a range of parameters from command line

script.pl start end 

for ($k1=$start; $k1<$end; $k1 += 0.001) {
  for ($k2=$start; $k2<$end; $k2 += 0.01) {
    for ($k3=$start; $k3<$end; $k3 += 0.001) {
      for ($k4=$start; $k4<$end; $k4 += 0.001) {
        for ($k5=$start; $k5<$end; $k5 += 0.001) {
...

}}}}}

if I set the parameters between 0 to 1, it takes a long time. The simplest way is to split them into smaller intervals like

script.pl 0 0.01 
script.pl 0.01 0.02
...
script.pl 0.9 1

Then I have to open 100 screen at the same time!!

Can somebody guide me how I can do it automatically?

I was not sure what would be the best way, for this reason I asked. I have 256 cores.

Opening 100 screens at a time on a same machine and running 100 different scripts on them does not parallelize anything. If your machine has 2 cores the OS would probably allocate the first 50 scripts on core 1 and the next 50 on core 2 — arunmoezhi
– arunmoezhi, Commented Sep 30, 2014 at 19:55
What are you actually trying to accomplish? This looks like a whole lot of nested iterations... but that doesn't do anything. Anyway, perl supports both threading and forking to do parallel code. Which is the most apt is very much a question of what you're trying to accomplish. — Sobrique
– Sobrique, Commented Sep 30, 2014 at 19:57
Have you tried anything? Have you looked into the perl methods for distributing work across cores? Given your example of how to split them does a solution not present itself to you? Does being reminded that you can run scripts in the background from the shell with & help any? — Etan Reisner
– Etan Reisner, Commented Sep 30, 2014 at 19:57
@arunmoezhi That is exactly parallelized. You even said as much in your comment. Currently, presumably, his script runs serially (and on one core unless he's using perl threading support explicitly or implicitly). Split it up and he is guaranteed to be able to use more than one core if they exist (you said so yourself). — Etan Reisner
– Etan Reisner, Commented Sep 30, 2014 at 19:59
I wanted to point out that opening 100 screens does not accomplish anything and it is a painful way to parallelize things. As you said using '&' and running it in background would be a better option. And adding a 'nohup' would also be beneficial if the script is going to run for a long time. — arunmoezhi
– arunmoezhi, Commented Sep 30, 2014 at 20:06

Sobrique · Accepted Answer · 2014-10-01 09:25:02Z

2

The really critical question when looking at parallel code is dependencies. I'm going to assume that - because your script can be subdivided - you're not doing anything complicated inside the loop.

But because you're stepping by 0.001 and 5 loops deep you're just doing a LOT of iterations if you were to go from 0 to 1. 100,000,000,000,000 of them, to be precise.

To parallelise, I would personally suggest you 'unroll' the outer loop and use Parallel::ForkManager.

E.g.

my $CPU_count = 256;

my $fork_manager = Parallel::ForkManager->new($CPU_count);

for ( my $k1 = $start; $k1 < $end; $k1 += 0.001 ) {
    # Run outer loop in parallel
    my $pid = $fork_manager->start and next;

    for ( my $k2 = $start; $k2 < $end; $k2 += 0.01 ) {
        for ( my $k3 = $start; $k3 < $end; $k3 += 0.001 ) {
            for ( my $k4 = $start; $k4 < $end; $k4 += 0.001 ) {
                for ( my $k5 = $start; $k5 < $end; $k5 += 0.001 ) {
                    ...;
                }
            }
        }
    }

    $fork_manager->end;
}

What this will do is - for each iteration of that 'outer' loop, fork your process and run the 4 inner loops as a separate process. It'll cap at 256 concurrent processes. You should match this to the number of CPUs you have available.

Bear in mind though - this only really works for trivial 'cpu intensive' tasks. If you're doing much disk IO or trying to share memory this won't work nearly as well.

Also note - if the number of steps on the outer loop is fewer than the number of CPUs it won't parallelise quite so well.

I'd also note - $k2 has a smaller iterator. I've copied that from your source, but it may be a typo.

edited Oct 1, 2014 at 9:25

answered Sep 30, 2014 at 20:39

Sobrique

53.6k8 gold badges63 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

EpiMan Over a year ago

Thanks a lot. very helpful. $k2 has a smaller iterator. that's correct

EpiMan Over a year ago

I faced with a problem when I was printing the output of nested loops in file. I already posted (stackoverflow.com/questions/26179647/…).

Sobrique Over a year ago

Yes, one of the downsides of using 'fork()' is that you get race conditions when doing IO. Saving the results and feeding them out to '$k1.results' would do the trick, as $k1 will be different in each fork. But bear in mind IO is often a limiting factor on parallel processing.

terdon · Accepted Answer · 2014-09-30 20:06:56Z

1

I'm not sure what you mean but this will launch 100 jobs in the background in parallel. Note that it can bring your computer to its knees, depending on your hardware:

$ seq 0 0.02 0.99 | perl -lne 'print "$_ ",$_+0.01' | 
    while read start end; do script.pl $start $end; done; script.pl 0.99 1

The idea is to use seq to generate the intervals, piped through a little perl script that prints out the pairs. These are then read by the bash loop and the script is launched with the relevant parameters.

Note, however, that this is far from an elegant way of achieving your goals. You might want to look into GNU Parallel or the various paralelization tools available for Perl itself.

answered Sep 30, 2014 at 20:06

terdon

3,4005 gold badges39 silver badges62 bronze badges

Comments

clt60 · Accepted Answer · 2014-09-30 20:26:36Z

1

Variant of terdon's answer:

paste <(seq -w 0 .01 1) <(seq -w 0.01 0.01 1.01) | xargs -n2 -P 255 ./script.pl

will start 255 paralell processes in the next form

./script.pl 0.00 0.01
./script.pl 0.01 0.02
...
...
./script.pl 0.98 0.99
./script.pl 0.99 1.00

answered Sep 30, 2014 at 20:26

clt60

64.3k17 gold badges114 silver badges206 bronze badges

Collectives™ on Stack Overflow

Run a script on multi cores and parallel processing

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related