0

I am writing a script which take a range of parameters from command line

script.pl start end 

for ($k1=$start; $k1<$end; $k1 += 0.001) {
  for ($k2=$start; $k2<$end; $k2 += 0.01) {
    for ($k3=$start; $k3<$end; $k3 += 0.001) {
      for ($k4=$start; $k4<$end; $k4 += 0.001) {
        for ($k5=$start; $k5<$end; $k5 += 0.001) {
...

}}}}}

if I set the parameters between 0 to 1, it takes a long time. The simplest way is to split them into smaller intervals like

script.pl 0 0.01 
script.pl 0.01 0.02
...
script.pl 0.9 1

Then I have to open 100 screen at the same time!!

Can somebody guide me how I can do it automatically?

I was not sure what would be the best way, for this reason I asked. I have 256 cores.

9
  • Opening 100 screens at a time on a same machine and running 100 different scripts on them does not parallelize anything. If your machine has 2 cores the OS would probably allocate the first 50 scripts on core 1 and the next 50 on core 2 Commented Sep 30, 2014 at 19:55
  • 1
    What are you actually trying to accomplish? This looks like a whole lot of nested iterations... but that doesn't do anything. Anyway, perl supports both threading and forking to do parallel code. Which is the most apt is very much a question of what you're trying to accomplish. Commented Sep 30, 2014 at 19:57
  • Have you tried anything? Have you looked into the perl methods for distributing work across cores? Given your example of how to split them does a solution not present itself to you? Does being reminded that you can run scripts in the background from the shell with & help any? Commented Sep 30, 2014 at 19:57
  • @arunmoezhi That is exactly parallelized. You even said as much in your comment. Currently, presumably, his script runs serially (and on one core unless he's using perl threading support explicitly or implicitly). Split it up and he is guaranteed to be able to use more than one core if they exist (you said so yourself). Commented Sep 30, 2014 at 19:59
  • I wanted to point out that opening 100 screens does not accomplish anything and it is a painful way to parallelize things. As you said using '&' and running it in background would be a better option. And adding a 'nohup' would also be beneficial if the script is going to run for a long time. Commented Sep 30, 2014 at 20:06

3 Answers 3

2

The really critical question when looking at parallel code is dependencies. I'm going to assume that - because your script can be subdivided - you're not doing anything complicated inside the loop.

But because you're stepping by 0.001 and 5 loops deep you're just doing a LOT of iterations if you were to go from 0 to 1. 100,000,000,000,000 of them, to be precise.

To parallelise, I would personally suggest you 'unroll' the outer loop and use Parallel::ForkManager.

E.g.

my $CPU_count = 256;

my $fork_manager = Parallel::ForkManager->new($CPU_count);

for ( my $k1 = $start; $k1 < $end; $k1 += 0.001 ) {
    # Run outer loop in parallel
    my $pid = $fork_manager->start and next;

    for ( my $k2 = $start; $k2 < $end; $k2 += 0.01 ) {
        for ( my $k3 = $start; $k3 < $end; $k3 += 0.001 ) {
            for ( my $k4 = $start; $k4 < $end; $k4 += 0.001 ) {
                for ( my $k5 = $start; $k5 < $end; $k5 += 0.001 ) {
                    ...;
                }
            }
        }
    }

    $fork_manager->end;
}

What this will do is - for each iteration of that 'outer' loop, fork your process and run the 4 inner loops as a separate process. It'll cap at 256 concurrent processes. You should match this to the number of CPUs you have available.

Bear in mind though - this only really works for trivial 'cpu intensive' tasks. If you're doing much disk IO or trying to share memory this won't work nearly as well.

Also note - if the number of steps on the outer loop is fewer than the number of CPUs it won't parallelise quite so well.

I'd also note - $k2 has a smaller iterator. I've copied that from your source, but it may be a typo.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot. very helpful. $k2 has a smaller iterator. that's correct
I faced with a problem when I was printing the output of nested loops in file. I already posted (stackoverflow.com/questions/26179647/…).
Yes, one of the downsides of using 'fork()' is that you get race conditions when doing IO. Saving the results and feeding them out to '$k1.results' would do the trick, as $k1 will be different in each fork. But bear in mind IO is often a limiting factor on parallel processing.
1

I'm not sure what you mean but this will launch 100 jobs in the background in parallel. Note that it can bring your computer to its knees, depending on your hardware:

$ seq 0 0.02 0.99 | perl -lne 'print "$_ ",$_+0.01' | 
    while read start end; do script.pl $start $end; done; script.pl 0.99 1

The idea is to use seq to generate the intervals, piped through a little perl script that prints out the pairs. These are then read by the bash loop and the script is launched with the relevant parameters.

Note, however, that this is far from an elegant way of achieving your goals. You might want to look into GNU Parallel or the various paralelization tools available for Perl itself.

Comments

1

Variant of terdon's answer:

paste <(seq -w 0 .01 1) <(seq -w 0.01 0.01 1.01) | xargs -n2 -P 255 ./script.pl

will start 255 paralell processes in the next form

./script.pl 0.00 0.01
./script.pl 0.01 0.02
...
...
./script.pl 0.98 0.99
./script.pl 0.99 1.00

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.