python multiprocessing vs threading for cpu bound work on windows and linux

Question

So I knocked up some test code to see how the multiprocessing module would scale on CPU bound work compared to threading. On Linux I get the performance increase that I'd expect:

linux (dual quad core xeon):
serialrun took 1192.319 ms
parallelrun took 346.727 ms
threadedrun took 2108.172 ms

My dual core MacBook Pro shows the same behaviour:

osx (dual core macbook pro)
serialrun took 2026.995 ms
parallelrun took 1288.723 ms
threadedrun took 5314.822 ms

I then went and tried it on a Windows machine and got some very different results:

windows (i7 920):
serialrun took 1043.000 ms
parallelrun took 3237.000 ms
threadedrun took 2343.000 ms

Why, oh why, is the multiprocessing approach so much slower on Windows?

Here's the test code:

#!/usr/bin/env python

import multiprocessing
import threading
import time

def print_timing(func):
    def wrapper(*arg):
        t1 = time.time()
        res = func(*arg)
        t2 = time.time()
        print '%s took %0.3f ms' % (func.func_name, (t2-t1)*1000.0)
        return res
    return wrapper


def counter():
    for i in xrange(1000000):
        pass

@print_timing
def serialrun(x):
    for i in xrange(x):
        counter()

@print_timing
def parallelrun(x):
    proclist = []
    for i in xrange(x):
        p = multiprocessing.Process(target=counter)
        proclist.append(p)
        p.start()
    
    for i in proclist:
        i.join()

@print_timing
def threadedrun(x):
    threadlist = []
    for i in xrange(x):
        t = threading.Thread(target=counter)
        threadlist.append(t)
        t.start()
    
    for i in threadlist:
        i.join()

def main():
    serialrun(50)
    parallelrun(50)
    threadedrun(50)

if __name__ == '__main__':
    main()

I ran your test code on a quad core Dell PowerEdge 840 running Win2K3, and the results weren't as dramatic as yours, but your point remains valid: serialrun took 1266.000 ms parallelrun took 1906.000 ms threadedrun took 4359.000 ms I'll be interested to see what answers you get. I don't know myself. — Jeff
– Jeff, Commented Aug 17, 2009 at 19:16

hughdbrown · Accepted Answer · 2009-08-17 19:32:12Z

28

The python documentation for multiprocessing blames the lack of os.fork() for the problems in Windows. It may be applicable here.

See what happens when you import psyco. First, easy_install it:

C:\Users\hughdbrown>\Python26\scripts\easy_install.exe psyco
Searching for psyco
Best match: psyco 1.6
Adding psyco 1.6 to easy-install.pth file

Using c:\python26\lib\site-packages
Processing dependencies for psyco
Finished processing dependencies for psyco

Add this to the top of your python script:

import psyco
psyco.full()

I get these results without:

serialrun took 1191.000 ms
parallelrun took 3738.000 ms
threadedrun took 2728.000 ms

I get these results with:

serialrun took 43.000 ms
parallelrun took 3650.000 ms
threadedrun took 265.000 ms

Parallel is still slow, but the others burn rubber.

Edit: also, try it with the multiprocessing pool. (This is my first time trying this and it is so fast, I figure I must be missing something.)

@print_timing
def parallelpoolrun(reps):
    pool = multiprocessing.Pool(processes=4)
    result = pool.apply_async(counter, (reps,))

Results:

C:\Users\hughdbrown\Documents\python\StackOverflow>python  1289813.py
serialrun took 57.000 ms
parallelrun took 3716.000 ms
parallelpoolrun took 128.000 ms
threadedrun took 58.000 ms

edited Aug 17, 2009 at 19:32

answered Aug 17, 2009 at 19:20

hughdbrown

49.3k20 gold badges89 silver badges111 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

manghole Over a year ago

Very neat! Lowering the number of iterations (processes) while raising the count-to value shows that, as Byron told, that the parrallel slowness comes from the added setup time of Windows Processes.

manghole Over a year ago

The Pool does not seem to wait for itself to complete, there is a join() method for Pool but it doesn't seem to do what I think it should do :P.

Byron Whitlock · Accepted Answer · 2012-03-10 21:28:18Z

26

Processes are much more lightweight under UNIX variants. Windows processes are heavy and take much more time to start up. Threads are the recommended way of doing multiprocessing on windows.

edited Mar 10, 2012 at 21:28

answered Aug 17, 2009 at 19:13

Byron Whitlock

54.2k29 gold badges129 silver badges170 bronze badges

2 Comments

manghole Over a year ago

Oh interesting, then would that mean that a change to the balance of the test, say counting higher but fewer times, would let Windows reclaim some multiprocessing performance? I shall give it a go.

manghole Over a year ago

Tried recalibrating to counting to 10.000.000 and 8 iterations and the results are more in Windows' favor: <pre>serialrun took 1651.000 ms parallelrun took 696.000 ms threadedrun took 3665.000 ms</pre>

Community · Accepted Answer · 2017-05-23 10:31:34Z

5

It's been said that creating processes on Windows is more expensive than on linux. If you search around the site you will find some information. Here's one I found easily.

edited May 23, 2017 at 10:31

CommunityBot

11 silver badge

answered Aug 17, 2009 at 19:12

Duck

27.7k5 gold badges67 silver badges94 bronze badges

Comments

Paul Wells · Accepted Answer · 2014-11-07 19:43:23Z

3

Just starting the pool takes a long time. I have found in 'real world' programs if I can keep a pool open and reuse it for many different processes,passing the reference down through method calls (usually using map.async) then on Linux I can save a few percent but on Windows I can often halve the time taken. Linux is always quicker for my particular problems but even on Windows I get net benefits from multiprocessing.

answered Nov 7, 2014 at 19:43

Paul Wells

1612 silver badges8 bronze badges

Comments

Karl Voigtland · Accepted Answer · 2009-08-17 19:40:26Z

1

Currently, your counter() function is not modifying much state. Try changing counter() so that it modifies many pages of memory. Then run a cpu bound loop. See if there is still a large disparity between linux and windows.

I'm not running python 2.6 right now, so I can't try it myself.

answered Aug 17, 2009 at 19:40

Karl Voigtland

7,73537 silver badges29 bronze badges

Collectives™ on Stack Overflow

python multiprocessing vs threading for cpu bound work on windows and linux

5 Answers 5

2 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related