I am using the multiprocessing module in Python and expect some overhead launching the process, creating a queue and putting and getting value to/from the queue. However, if the sub-process has enough work to do, I would expect that the overhead would eventually be washed out. Running a simple example (explained below), the runtime of my spawned process is about 10 times that of the same process launched from the parent process, even for very large jobs.
In the following code, I compute the mean of a series of larger and larger arrays. I compare calling numpy.mean from the parent process to calling the same mean function from a single spawned process and to doing nothing in a spawned process (to get an idea of overhead cost).
Initially, the results are as I expect. The total runtime is much faster when mean is called from the parent process than when called from a spawned process. For small jobs, the runtime for the spawned process is dominated by the overhead.
What is surprising, however, is that for larger jobs, the runtime for the spawned process consistently exceeds the cost of calling from the parent process by about a factor of 10.
Can anyone provide an explanation for this? Is this due to memory limitations in the sub-process? The largest arrays I test are 125MB, 500MB and 2GB.
Here is the code :
%matplotlib
import numpy, multiprocessing, pandas
def do_nothing(x,q):
q.put(x[-1])
def my_mean(x,q):
q.put(numpy.mean(x))
def test_mp(f,x):
q = multiprocessing.Queue()
p = multiprocessing.Process(target=f,args=(x,q))
p.start()
p.join()
s = q.get()
return s
ndata = 2**numpy.arange(10,29,2)
tr1,tr2,tr3 = [[],[],[]]
for n in ndata:
x = numpy.random.rand(n)
tresults = %timeit -n 1 -r 5 -o -q test_mp(do_nothing,x)
tr1.append(tresults)
tresults = %timeit -n 1 -r 5 -o -q test_mp(my_mean,x)
tr2.append(tresults)
tresults = %timeit -n 1 -r 5 -o -q numpy.mean(x)
tr3.append(tresults)
print("All done")
t1,t2,t3 = map(lambda tr : pandas.Series([1000*t.best for t in tr]),[tr1,tr2,tr3])
df = pandas.DataFrame({'n' : ndata, 't1 (do nothing)' : t1,
't2 (my_mean)' : t2,
't3 (mean)' : t3})
display(df)
df.plot(x='n',style='.-',markersize=10,logx=True,logy=True)
Here are the results. All timing results are in milliseconds.




do_nothing: as you are not multithreading, the data must be communicated from one process to the other, which is an overhead proportional ton. If I repeat the experiment, I get a different result, showing a slight increase indo_nothingat line 8, then...my PC fills its RAM.xat all indo_nothing, I don't see how the pickling cost is reflected.q.put(x[int(numpy.random.uniform(0,len(x)-1))])instead ofq.put(x[-1])? This is still nothing for the CPU, but python will never predict it ;)