2

I have a python script that reads many executables written and compiled in C program. There is no issue with these executables. However, When I have to run these executable in for loop, i tried to parallize the loop.

Note: prog1,prog2,prog3 must run in order.
This is a sample example, but in my real code
prog2 depends on output of prog1, and prog3 
depends on output of prog2 and so on.
I have seven executables in for loop of iteration 20,
it takes more than 2 hour to complete the process.
If i could parallize the code, it would save a lot of time.
Help would be greatly appreciated!!!!

In my code example 1 runs fine but example 2 doesnot run. The full code is presented below:

#!/usr/bin/python

from multiprocessing import Pool
import os, sys, subprocess, math, re, shutil,copy

#function to run a program and write output to the shell
################################################################################
def run_process(name, args,):
print "--------------------------------------------------------------------"
print "Running: %s"%name
print "Command:"
for arg in args:
    print arg,
print ""
print "--------------------------------------------------------------------"
process = subprocess.Popen(args)

process.communicate()
if process.returncode != 0:
    print "Error: %s did not terminate correctly. Return code: %i."%(name, process.returncode)
    sys.exit(1)  # this will exit the code in case of error
###########################       
# example 1
#run_process("prog1.c", ['./prog1'])
#run_process("prog2.c", ['./prog2'])        
#run_process("prog3.c", ['./prog3', 'first argument'])


# example 2 (parallizing)
commands = []
for x in range(0,20):
    commands.extend(("prog1.c",['./prog1']))
    commands.extend(("prog2.c",['./prog2']))
    commands.extend(("prog3.c",['./prog3', 'first argument']))


p = Pool()
p.map(run_process, commands)

Here, if i run example 1 it runs flawlessly. But when i try to run example 2, it gives following error:

    TypeError: run_process() takes exactly 2 arguments (1 given)

Further note:
To create the executables prog1,prog2,and prog3 I wrote C codes.
Which looks like this:

// to compile: gcc -o prog1 prog1.c
// to run : ./prog1
#include <stdio.h>
int main() {
printf("This is program 1\n");
return 0; }

prog2 looks exactly same. And prog3 looks like this:

//to compile: gcc -o prog3 prog3.c 
//to run: ./prog3 'argument1'
#include <stdio.h>
int main(int argc, char ** argv) {
printf("This is program 3\n");
printf("The argument is = %s\n", argv[1]);  
return 0; }

Now, there are 21 iterations inside the for loop.
In the first iteration it suppose it runs executables prog1,prog2....,prog7
and finally produce ouptput1.fits.
In the second interation it again run seven executables in order and produces output2.fits.
And finally it creates 21 fits files. What I can do is make four functions:
func1 for loop 0 to 5
fucn2 for loop 5 to 10
func3 for loop 11 to 15
func4 for loop 16 to 21
Then I want to run these four functions in parallel process.
My Question is : How can I run example 2 without any error?

6
  • This might be what you are looking for Commented Nov 17, 2015 at 19:22
  • You'll normally use a process pool and multiprocessing.map to achieve parallelism while not bringing the system down.. Commented Nov 17, 2015 at 19:47
  • 1
    Do these things you are running depend on each other? Is this run_process function running subprocesses? Commented Nov 18, 2015 at 15:26
  • 1
    If you read the Process documentation, you'd see that you need to call start on the process objects. Commented Nov 20, 2015 at 12:18
  • @tdelaney the processes depend on eachother, and run_process is a function defined above in the edited question. Commented Dec 9, 2015 at 2:24

3 Answers 3

4

Python has a Pool of processes built exactly for this purpose.

Given the fact you need to run X times the same sequence of commands and supposing the sequence of commands can be run in parallel. This means the Nth run can be run together with the Nth+1 without any implication.

from multiprocessing import Pool

commands = tuple(("prog1.c",['./prog1']), ...)

def run_processes(execution_index):
    print("Running sequence for the %d time." % execution_index)

    for command in commands:
        process = subprocess.Popen(command)
        ...

p = Pool()
p.map(run_processes, range(20))

On Pyhton3 you can use the ProcessExecutor.

Whenever you want to run something concurrently you need to understand the execution boundaries first. If two lines of execution are interdependent, you either set up a communication between the two (using for example a pipe) or avoid running them concurrently.

In your case, the commands are interdependent so it becomes problematic to run them concurrently. But if the whole sequence is not interdependent then you can run those in parallel.

Sign up to request clarification or add additional context in comments.

13 Comments

I tried this with a simple executable as mentioned above, and this did not worked! Do you have any idea?
I don't understand. Are you using the multiprocessing.Pool.map or the multiprocessing.Process?
The map function passes only one argument to the given function. In your case is a tuple with your two arguments. You can either expand with the * operator or just treat it as a list.
You didn't get what I mean. The run_process will receive only one argument, containing a tuple with the two entries you specified. I'll edit the answer to better show what I mean.
Did you managfe to get it working? If not, could you please edit your question showing the code and the exception you get when running it?
|
1
import multiprocessing
for x in range(0,20): 
    multiprocessing.Process(target=run_process, args=("colour.c",['./cl',"color.txt",str(x) ])  
    ...

not really sure what else I could add ...

1 Comment

thats what this is ... although you cannot "run" c code without first compiling it to an executable ...
1

Have a look at what group functions of Celery's canvas do. They allow you to call functions at the same time, with different set of arguments. Say you want to process a total of 1000 elements in your for loop. Doing the same sequentially is highly unoptimized. A simple solution will be to call the same function with two sets of arguments. Even this simple hack will bring down your processing time down by half. That is what Canvas and Celery are about.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.