2

Recently I wanted to speed up some of my code using parallel processing, as I have a Quad Core i7 and it seemed like a waste. I learned about python's (I'm using v 3.3.2 if it maters) GIL and how it can be overcome using the multiprocessing module, so I wrote this simple test program:

from multiprocessing import Process, Queue

def sum(a,b):
    su=0
    for i in range(a,b):
        su+=i
    q.put(su)

q= Queue()

p1=Process(target=sum, args=(1,25*10**7))
p2=Process(target=sum, args=(25*10**7,5*10**8))
p3=Process(target=sum, args=(5*10**8,75*10**7))
p4=Process(target=sum, args=(75*10**7,10**9))

p1.run()
p2.run()
p3.run()
p4.run()

r1=q.get()
r2=q.get()
r3=q.get()
r4=q.get()

print(r1+r2+r3+r4)

The code runs in about 48 seconds measured using cProfile, however the single process code

def sum(a,b):
    su=0
    for i in range(a,b):
        su+=i
    print(su)

sum(1,10**9)

runs in about 50 seconds. I understand that the method has overheads but i expected the improvements to be more drastic. The error with fork() doesn't apply to my as I'm running the code on a Mac.

4
  • 2
    Did you watch CPU load during the parallelized run? Were several cores loaded? Commented Jan 9, 2014 at 21:24
  • Yes the activity spiked in all 4 cores, curiously the same happened in the sequencial case. Activity monitor is also claiming python is using only 1 thread, switch to 2 about half way through the calculation (in the parallel case) Commented Jan 9, 2014 at 21:36
  • multiprocessing starts separate processes, which get separate rows in Activity Monitor (generally all called "Python"). Commented Jan 9, 2014 at 21:52
  • Also, it's worth noting that for code like this it's usually easier to use either multiprocessing.Pool or concurrent.futures.ProcessPoolExecutor than explicit Processes and Queues. For example, compare this. Commented Jan 9, 2014 at 22:04

2 Answers 2

6

The problem is that you're calling run rather than start.

If you read the docs, run is the "Method representing the process's activity", while start is the function that starts the process's activity on the background process. (This is the same as with threading.Thread.)

So, what you're doing is running the sum function on the main process, and never doing anything on the background processes.

From timing tests on my laptop, this cuts the time to about 37% of the original. Not quite the 25% you'd hope for, and I'm not sure why, but… good enough to prove that it's really multi-processing. (That, and the fact that I get four extra Python processes each using 60-100% CPU…)

Sign up to request clarification or add additional context in comments.

2 Comments

wow, down to 20 seconds, not quite the 4x in was hoping for but welcome anyways. :D
@Michal: Looks like I was running the same test as you at the same time, and got almost the same results (37% vs. 40%). I am mildly curious what's going on (there's absolutely no contention during the bulk of the work, very little memory to use, …), but not enough to dig in too deeply.
2

If you really want to write fast computations using python it is not the way to go. Use numpy, or cython. Your computations will be hundred times faster than plain python.

On the other hand if you just want to launch bunch of parralel jobs use proper tools for it, for example

from multiprocessing import Pool

def mysum(a,b):
    su=0
    for i in range(a,b):
    su+=i
    return su

with Pool() as pool:
    print(sum(pool.starmap(mysum, ((1,25*10**7), 
                               (25*10**7,5*10**8),
                               (5*10**7,75*10**7),
                               (75*10**7,10**9)))))

3 Comments

Your example doesn't actually return the values, so there's no way to print them out at the end. See here for code that does (and also shuts down the pool cleanly).
Your is way better ;)
On the other hand, your original way would be easier to adapt to imap_unordered (which might be worth doing in this case—no reason to fetch the results back in order if all we're doing is adding them), since there's no istarmap_unordered

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.