5

I am trying to implement multiprocessing with Python. It works when pooling very quick tasks, however, freezes when pooling longer tasks. See my example below:

from multiprocessing import Pool
import math
import time

def iter_count(addition):
    print "starting ", addition
    for i in range(1,99999999+addition):
        if i==99999999:  
            print "completed ", addition
            break

if __name__ == '__main__':
    print "starting pooling "
    pool = Pool(processes=2)
    time_start = time.time()
    possibleFactors = range(1,3)   

    try: 
        pool.map( iter_count, possibleFactors)
    except:
        print "exception"

    pool.close()
    pool.join()      

    #iter_count(1)
    #iter_count(2)
    time_end = time.time()
    print "total loading time is : ", round(time_end-time_start, 4)," seconds"

In this example, if I use smaller numbers in for loop (something like 9999999) it works. But when running for 99999999 it freezes. I tried running two processes (iter_count(1) and iter_count(2)) in sequence, and it takes about 28 seconds, so not really a big task. But when I pool them it freezes. I know that there are some known bugs in python around multiprocessing, however, in my case, same code works for smaller sub tasks, but freezes for bigger ones.

3
  • 1
    What version of Python are you using? Some of those known bugs in multiprocessing you referred to were fixed in 2.7, or in later 2.6.x or 2.7.x versions, but if you're using a version from before those fixes obviously you still have those bugs… And generally, multiprocessing/multithreading bugs are the kind of thing that only happen 1 time in a million or less, so it wouldn't be all that surprising if N usually works but 10N usually fails… Commented Jan 8, 2014 at 22:23
  • I am using python version 2.7.5 Commented Jan 8, 2014 at 22:43
  • 1
    I seem to remember having similar issues at some point in the past when my worker threads were doing lots of writing to stdout. Have you tried removing the print statement? Commented Jan 9, 2014 at 0:19

1 Answer 1

6

You're using some version of Python 2 - we can tell because of how print is spelled.

So range(1,99999999+addition) is creating a gigantic list, with at least 100 million integers. And you're doing that in 2 worker processes simultaneously. I bet your disk is grinding itself to dust while the OS swaps out everything it can ;-)

Change range to xrange and see what happens. I bet it will work fine then.

Sign up to request clarification or add additional context in comments.

3 Comments

When i change range to xrange, yes it works. However, what i don't understand is: how it works when I run those tasks sequentially, but freezes when I run them in parallel. And overall, we are not talking about complicated calculation, both tasks takes about 30 seconds.
It has nothing to do with the calculations: it has entirely to do with peak memory use. And your program wasn't freezing, it was just running extremely slowly because you were out of RAM. When you do them serially, it takes half the RAM. You were just lucky then. Those gigantic lists require gigabytes of RAM. xrange gives you an iterator instead of a giant list, and requires a tiny amount of RAM. That's all there is to it.
I love it the way one problem can masquerade as another.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.