How to parallelize the "for-loop" that run many executbles in order in python?

Question

I have a python script that reads many executables written and compiled in C program. There is no issue with these executables. However, When I have to run these executable in for loop, i tried to parallize the loop.

Note: prog1,prog2,prog3 must run in order.
This is a sample example, but in my real code
prog2 depends on output of prog1, and prog3 
depends on output of prog2 and so on.
I have seven executables in for loop of iteration 20,
it takes more than 2 hour to complete the process.
If i could parallize the code, it would save a lot of time.
Help would be greatly appreciated!!!!

In my code example 1 runs fine but example 2 doesnot run. The full code is presented below:

#!/usr/bin/python

from multiprocessing import Pool
import os, sys, subprocess, math, re, shutil,copy

#function to run a program and write output to the shell
################################################################################
def run_process(name, args,):
print "--------------------------------------------------------------------"
print "Running: %s"%name
print "Command:"
for arg in args:
    print arg,
print ""
print "--------------------------------------------------------------------"
process = subprocess.Popen(args)

process.communicate()
if process.returncode != 0:
    print "Error: %s did not terminate correctly. Return code: %i."%(name, process.returncode)
    sys.exit(1)  # this will exit the code in case of error
###########################       
# example 1
#run_process("prog1.c", ['./prog1'])
#run_process("prog2.c", ['./prog2'])        
#run_process("prog3.c", ['./prog3', 'first argument'])


# example 2 (parallizing)
commands = []
for x in range(0,20):
    commands.extend(("prog1.c",['./prog1']))
    commands.extend(("prog2.c",['./prog2']))
    commands.extend(("prog3.c",['./prog3', 'first argument']))


p = Pool()
p.map(run_process, commands)

Here, if i run example 1 it runs flawlessly. But when i try to run example 2, it gives following error:

    TypeError: run_process() takes exactly 2 arguments (1 given)

Further note:
To create the executables prog1,prog2,and prog3 I wrote C codes.
Which looks like this:

// to compile: gcc -o prog1 prog1.c
// to run : ./prog1
#include <stdio.h>
int main() {
printf("This is program 1\n");
return 0; }

prog2 looks exactly same. And prog3 looks like this:

//to compile: gcc -o prog3 prog3.c 
//to run: ./prog3 'argument1'
#include <stdio.h>
int main(int argc, char ** argv) {
printf("This is program 3\n");
printf("The argument is = %s\n", argv[1]);  
return 0; }

Now, there are 21 iterations inside the for loop.
In the first iteration it suppose it runs executables prog1,prog2....,prog7
and finally produce ouptput1.fits.
In the second interation it again run seven executables in order and produces output2.fits.
And finally it creates 21 fits files. What I can do is make four functions:
func1 for loop 0 to 5
fucn2 for loop 5 to 10
func3 for loop 11 to 15
func4 for loop 16 to 21
Then I want to run these four functions in parallel process.
My Question is : How can I run example 2 without any error?

You'll normally use a process pool and multiprocessing.map to achieve parallelism while not bringing the system down.. — thebjorn
– thebjorn, Commented Nov 17, 2015 at 19:47
Do these things you are running depend on each other? Is this run_process function running subprocesses? — tdelaney
– tdelaney, Commented Nov 18, 2015 at 15:26
If you read the Process documentation, you'd see that you need to call start on the process objects. — noxdafox
– noxdafox, Commented Nov 20, 2015 at 12:18
@tdelaney the processes depend on eachother, and run_process is a function defined above in the edited question. — BhishanPoudel
– BhishanPoudel, Commented Dec 9, 2015 at 2:24

noxdafox · Accepted Answer · 2016-01-05 09:40:00Z

4

Python has a Pool of processes built exactly for this purpose.

Given the fact you need to run X times the same sequence of commands and supposing the sequence of commands can be run in parallel. This means the Nth run can be run together with the Nth+1 without any implication.

from multiprocessing import Pool

commands = tuple(("prog1.c",['./prog1']), ...)

def run_processes(execution_index):
    print("Running sequence for the %d time." % execution_index)

    for command in commands:
        process = subprocess.Popen(command)
        ...

p = Pool()
p.map(run_processes, range(20))

On Pyhton3 you can use the ProcessExecutor.

Whenever you want to run something concurrently you need to understand the execution boundaries first. If two lines of execution are interdependent, you either set up a communication between the two (using for example a pipe) or avoid running them concurrently.

In your case, the commands are interdependent so it becomes problematic to run them concurrently. But if the whole sequence is not interdependent then you can run those in parallel.

edited Jan 5, 2016 at 9:40

answered Nov 18, 2015 at 12:04

noxdafox

15.1k4 gold badges37 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

13 Comments

BhishanPoudel Over a year ago

I tried this with a simple executable as mentioned above, and this did not worked! Do you have any idea?

noxdafox Over a year ago

I don't understand. Are you using the multiprocessing.Pool.map or the multiprocessing.Process?

noxdafox Over a year ago

The map function passes only one argument to the given function. In your case is a tuple with your two arguments. You can either expand with the * operator or just treat it as a list.

noxdafox Over a year ago

You didn't get what I mean. The run_process will receive only one argument, containing a tuple with the two entries you specified. I'll edit the answer to better show what I mean.

noxdafox Over a year ago

Did you managfe to get it working? If not, could you please edit your question showing the code and the exception you get when running it?

|

Joran Beasley · Accepted Answer · 2015-11-17 19:20:56Z

1

import multiprocessing
for x in range(0,20): 
    multiprocessing.Process(target=run_process, args=("colour.c",['./cl',"color.txt",str(x) ])  
    ...

not really sure what else I could add ...

answered Nov 17, 2015 at 19:20

Joran Beasley

114k13 gold badges167 silver badges187 bronze badges

1 Comment

Joran Beasley Over a year ago

thats what this is ... although you cannot "run" c code without first compiling it to an executable ...

user2058724 · Accepted Answer · 2016-02-12 12:02:20Z

1

Have a look at what group functions of Celery's canvas do. They allow you to call functions at the same time, with different set of arguments. Say you want to process a total of 1000 elements in your for loop. Doing the same sequentially is highly unoptimized. A simple solution will be to call the same function with two sets of arguments. Even this simple hack will bring down your processing time down by half. That is what Canvas and Celery are about.

answered Feb 12, 2016 at 12:02

user2058724

1251 gold badge2 silver badges8 bronze badges

Collectives™ on Stack Overflow

How to parallelize the "for-loop" that run many executbles in order in python?

3 Answers 3

13 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

13 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related