0

I am looking to speed up a slow running loop, but do not think I am going at this with the best approach. I would like to parallelize some code which runs a function which I have written and am having some trouble trying to figure out exactly how to formulate the input parameters when using python's multiprocessing module. The code I have is essentially of the following form:

a = some_value
b = some_value
c = some_value
for i in range(1,101):
    for j in range(1,101):
        b = np.array([i*0.001,j*0.001]).reshape((2,1))
        (A,B,C,D) = function(a,b,c,d)

So my function itself takes on a variety of parameters but for this particular use I need to vary only one variable (which is an array of two values) and create a grid of values. Also, all other inputs are integers. I am familiar with very simple examples of parallelizing such loops by using a pool of workers through the following example code:

pool = mp.Pool(processes=4)
input_parameters = *list of iterables for multiprocessing*
result = pool.map(paramest.parameter_estimate_ND, input_parameters)

where the list of iterables is created using the itertools module. Since I am changing only one input variable of the function and all others are declared before I am having troubles structuring such input parameters. So what I would really like is to use multiprocessing to run different inputs at the same time to speed up the execution of the for loops.

My question then being, how would one structure the use of multiprocessing to parallelize code that runs on a function while only changing inputs of specific variables?

Am I approaching this in the best manner? Is there a better way to do such a thing?

Thank you!

3
  • 2
    Are you just doing math operations or are you using other python tools? Because if you are just doing math there are better tools to speed up your for loop Commented Aug 27, 2018 at 21:47
  • I am not sure what you mean by other python 'tools' but the function I am parallelizing uses some input data and does a number of mathematical operations (using numpy). Essentially using the data to test some conditions and create a number of different output arrays. Commented Sep 5, 2018 at 1:46
  • What I mean is that I don't think you need to utilize processes because there are already libraries like Numba that can optimize your for loop and parallelize it, as long as you are just using Numpy in the function in the inner for loop Commented Sep 5, 2018 at 21:45

1 Answer 1

2

Normally, you only need to worry about parallelizing the inner loop of a nested loop. Assuming each call to function is heavy enough to be worth running as a task, putting 100 of them at a time into a Pool should be more than enough.


So, how do you parallelize that inner loop?

Just turn it into a function:

def wrapper(a, c, d, i, j):
    b = np.array([i*0.001,j*0.001]).reshape((2,1))
    return function(a,b,c,d)

And now:

for i in range(1,101):
    pfunc = partial(function, a, c, d, i)
    ABCDs = pool.map(pfunc, range(1, 101))

Or, instead of creating a partial, you can even just define the wrapper function inside the i loop:

for i in range(1,101):
    def wrapper(j):
        b = np.array([i*0.001,j*0.001]).reshape((2,1))
        return function(a,b,c,d)
    ABCDs = pool.map(wrapper, range(1, 101))

If you run into problems passing closure variables over the pool's queue, that's easy; you don't actually need to capture the variables, just the values, so:

for i in range(1,101):
    def wrapper(j, *, a=a, c=c, d=d, i=i):
        b = np.array([i*0.001,j*0.001]).reshape((2,1))
        return function(a,b,c,d)
    ABCDs = pool.map(wrapper, range(1, 101))

If it turns out that j alone isn't enough parallelism, you can easily change it to map over (i, j) instead:

def wrapper(i, j, *, a=a, b=b, c=c, d=d):
    b = np.array([i*0.001,j*0.001]).reshape((2,1))
    return function(a,b,c,d)
for i in range(1,101):
    ABCDs = pool.map(wrapper, itertools.product(range(1, 101), range(1, 101)))

That ABCDs is going to be an iterable of the A, B, C, D values, so most likely, whatever it is that you wanted to do with A, B, C, D is just a matter of:

    for A, B, C, D in ABCDs:
        # whatever
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! This worked perfectly! I was having troubles with it at first, because I was putting pool = mp.Pool(processes=4) before defining the wrapper function. Simply defining the wrapper before putting that line of code fixed it all and works exactly how I wanted it to.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.