Memory issue in parallel Python

Question

I have a Python script like this:

from modules import functions
a=1
parameters = par_vals
for i in range(large_number):
    #do lots of stuff dependent on a, plot stuff, save plots as png

When I run this for a value of "a" it takes half an hour, and uses only 1 core of my 6 core machine.

I want to run this code for 100 different values of "a"

The question is: How can I parallelize this so I use all cores and try all values of "a"?

my first approach following an online suggestion was:

from joblib import Parallel, delayed
def repeat(a):
    from modules import functions
    parameters = par_vals
    for i in range(large_number):
        #do lots of stuff dependent on a, plot stuff, save plots as png

A=list_100_a #list of 100 different a values
Parallel(n_jobs=6,verbose=0)(delayed(repeat)(a) for a in A)

This successfully use all my cores in the computer but it was computing for all the 100 values of a at the same time. after 4 hours my 64GB RAM memory and 64GB swap memory would be saturated and the performance dropped drastically. So I tried to manually cue the function doing it 6 times at a time inside a for loop. But the problem was that the memory would be consumed also.

I don't know where the problem is. I guess that somehow the program is keeping unnecessary memory.

What can I do so that I don't have this memory problem.

In summary: When I run this function for a specific value of "a" everything is ok. When I run this function in parallel for 6 values of "a" everything is ok. When I sequencially run this function in parallel the memory gradually inscreases until the computer can no longer work.

UPDATE: I found a solution for the memory problem even though I don't understand why.

It appears that changing the backend of matplotlib to 'Agg' no longer produced the memory problem.

just add this before any import and you should be fine:

from matplotlib import use
use('Agg')

Have you tried running this outside of IPython? I also suggest dumping joblib and directly using multiprocessing. — Veedrac
– Veedrac, Commented Sep 28, 2013 at 13:21
no, I will try that now. I wish I understood how to do this on multiprocessing. I am unable to find examples that suit my needs — gota
– gota, Commented Sep 28, 2013 at 13:25
@NunoCalaim Do you know what backend it was using before that? — Sam Mussmann
– Sam Mussmann, Commented Oct 1, 2013 at 20:37

Sam Mussmann · Accepted Answer · 2013-09-28 15:04:12Z

1

Here's how I would do it with multiprocessing. I'll use your repeat function to do the work for one value of a.

def repeat(a):
    from modules import functions
    parameters = par_vals
    for i in range(large_number):
        #do lots of stuff dependent on a, plot stuff, save plots as png

Then I'd use multiprocessing.pool like this:

import multiprocessing

pool = multiprocessing.Pool(processes=6)  # Create a pool with 6 workers.
A=list_100_a #list of 100 different a values

# Use the workers in the pool to call repeat on each value of a in A.  We
# throw away the result of calling map, since it looks like the point of calling
# repeat(a) is for the side effects (files created, etc).
pool.map(repeat, A) 

# Close the pool so no more jobs can be submitted to it, then wait for 
# all workers to exit.
pool.close()
pool.join()

If you wanted the result of calling repeat, you could just do result = pool.map(repeat, A).

I don't think you'll run into any issues, but it's also useful to read the programming guidelines for using multiprocessing.

answered Sep 28, 2013 at 15:04

Sam Mussmann

6,0132 gold badges31 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

gota Over a year ago

It worked but with the same problem of memory. I will accept your answer as it taught e how to use multiprocessing for doing exactly what I want. The memory problem however was solved by changing a backend, see my edited question for details

Collectives™ on Stack Overflow

Memory issue in parallel Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related