1

I have this code that I tried to make parallel based on a previous question. Here is the code using 2 processes.

import  multiprocessing
import timeit

start_time = timeit.default_timer()

d1 = dict( (i,tuple([i*0.1,i*0.2,i*0.3])) for i in range(500000) )
d2={}

def fun1(gn):
    x,y,z = d1[gn]
    d2.update({gn:((x+y+z)/3)})

#
if __name__ == '__main__':
    gen1 = [x for x in d1.keys()]
    #fun1(gen1)
    p= multiprocessing.Pool(2)
    p.map(fun1,gen1)
    print('Script finished')
    stop_time = timeit.default_timer()
    print(stop_time - start_time)

Output is:

Script finished
1.8478448875989333

If I change the program to sequential,

fun1(gen1)
#p= multiprocessing.Pool(2)
#p.map(fun1,gen1)

output is:

Script finished
0.8345944193950299

So parallel loop is taking more time that sequential loop, more than double. (My computer has 2 cores, running on Windows.) I tried to find similar questions on the topic, this and this but could not figure out the reason. How can I get performance improvement using multiprocessing module in this example?

2
  • If all you're doing is adding numbers, that's probably not an expensive enough operation to overcome the overhead of creating the pool. Commented Aug 12, 2017 at 15:27
  • fun1(gen1) is not equal to the multiprocessing code. Commented Aug 12, 2017 at 15:29

1 Answer 1

2

When you do p.map(fun1,gen1) you send gen1 over to the other process. This includes serializing the list which is 500000 elements big.

Comparing serialization to the small computation, it takes much longer.

You can measure or profile where the time is spent.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.