I am trying to parallelize my code using the multiprocessing module. The code I am working on works in two steps. In the first step I initialize a class, which calculates and holds several variables, which are used in the second step. In the second step the program performs calculations using the previously initialized variables. The variables of the first step are not modified in any way. The calculation time of the first step is not important but in the second step because it is called a few hundred times in necessarily sequential order. Below is a constructed minimal example of the code structure and ist output.
import numpy as np
import time
from multiprocessing import Pool
class test:
def __init__(self):
self.r = np.ones(10000000)
def f(self,init):
summed = 0
for i in range(0,init):
summed = summed + i
return summed
if __name__ == "__main__":
# first step
func = test()
# second step
# sequential
start_time = time.time()
for i in [1000000,1000000,1000000,1000000]:
func.f(i)
print('Sequential: ', time.time()-start_time)
# parallel
start_time = time.time()
pool = Pool(processes=None)
result = pool.starmap(func.f,[[1000000],[1000000],[1000000],[1000000]])
print('Parallel: ', time.time()-start_time)
Output:
Sequential: 0.2673146724700928
Parallel: 1.5638213157653809
As I understand multiprocessing becomes slower because the variable r of the class test has to be transferred to all of the worker processes. To circumvent this I would need to initialize the class on each worker before starting f. Is this possible with multiprocessing? Are there other tools for doing this?
func = test()and runsfunc.f(...)and use this function instarmap