Python Multiprocessing Troubleshooting

Question

Recently I wanted to speed up some of my code using parallel processing, as I have a Quad Core i7 and it seemed like a waste. I learned about python's (I'm using v 3.3.2 if it maters) GIL and how it can be overcome using the multiprocessing module, so I wrote this simple test program:

from multiprocessing import Process, Queue

def sum(a,b):
    su=0
    for i in range(a,b):
        su+=i
    q.put(su)

q= Queue()

p1=Process(target=sum, args=(1,25*10**7))
p2=Process(target=sum, args=(25*10**7,5*10**8))
p3=Process(target=sum, args=(5*10**8,75*10**7))
p4=Process(target=sum, args=(75*10**7,10**9))

p1.run()
p2.run()
p3.run()
p4.run()

r1=q.get()
r2=q.get()
r3=q.get()
r4=q.get()

print(r1+r2+r3+r4)

The code runs in about 48 seconds measured using cProfile, however the single process code

def sum(a,b):
    su=0
    for i in range(a,b):
        su+=i
    print(su)

sum(1,10**9)

runs in about 50 seconds. I understand that the method has overheads but i expected the improvements to be more drastic. The error with fork() doesn't apply to my as I'm running the code on a Mac.

Did you watch CPU load during the parallelized run? Were several cores loaded? — 9000
– 9000, Commented Jan 9, 2014 at 21:24
Yes the activity spiked in all 4 cores, curiously the same happened in the sequencial case. Activity monitor is also claiming python is using only 1 thread, switch to 2 about half way through the calculation (in the parallel case) — Michal
– Michal, Commented Jan 9, 2014 at 21:36
multiprocessing starts separate processes, which get separate rows in Activity Monitor (generally all called "Python"). — abarnert
– abarnert, Commented Jan 9, 2014 at 21:52
Also, it's worth noting that for code like this it's usually easier to use either multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor than explicit Processes and Queues. For example, compare this. — abarnert
– abarnert, Commented Jan 9, 2014 at 22:04

abarnert · Accepted Answer · 2014-01-09 21:54:26Z

6

The problem is that you're calling run rather than start.

If you read the docs, run is the "Method representing the process's activity", while start is the function that starts the process's activity on the background process. (This is the same as with threading.Thread.)

So, what you're doing is running the sum function on the main process, and never doing anything on the background processes.

From timing tests on my laptop, this cuts the time to about 37% of the original. Not quite the 25% you'd hope for, and I'm not sure why, but… good enough to prove that it's really multi-processing. (That, and the fact that I get four extra Python processes each using 60-100% CPU…)

answered Jan 9, 2014 at 21:54

abarnert

368k54 gold badges626 silver badges691 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Michal Over a year ago

wow, down to 20 seconds, not quite the 4x in was hoping for but welcome anyways. :D

abarnert Over a year ago

@Michal: Looks like I was running the same test as you at the same time, and got almost the same results (37% vs. 40%). I am mildly curious what's going on (there's absolutely no contention during the bulk of the work, very little memory to use, …), but not enough to dig in too deeply.

jb. · Accepted Answer · 2014-01-09 22:07:54Z

2

If you really want to write fast computations using python it is not the way to go. Use numpy, or cython. Your computations will be hundred times faster than plain python.

On the other hand if you just want to launch bunch of parralel jobs use proper tools for it, for example

from multiprocessing import Pool

def mysum(a,b):
    su=0
    for i in range(a,b):
    su+=i
    return su

with Pool() as pool:
    print(sum(pool.starmap(mysum, ((1,25*10**7), 
                               (25*10**7,5*10**8),
                               (5*10**7,75*10**7),
                               (75*10**7,10**9)))))

edited Jan 9, 2014 at 22:07

answered Jan 9, 2014 at 22:01

jb.

24.1k18 gold badges102 silver badges139 bronze badges

3 Comments

abarnert Over a year ago

Your example doesn't actually return the values, so there's no way to print them out at the end. See here for code that does (and also shuts down the pool cleanly).

jb. Over a year ago

Your is way better ;)

abarnert Over a year ago

On the other hand, your original way would be easier to adapt to imap_unordered (which might be worth doing in this case—no reason to fetch the results back in order if all we're doing is adding them), since there's no istarmap_unordered…

Collectives™ on Stack Overflow

Python Multiprocessing Troubleshooting

2 Answers 2

2 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related