I need to accumulate the results of many trees for some query that outputs a large result. Since all trees can be handled independently it is embarrassingly parallel, except for the fact that the results needs to be summed and I cannot store the intermediate results for all trees in memory. Below is a simple example of a code for the problem which saves all the intermediate results in memory (of course the functions are newer the same in the real problem since that would be doing duplicated work).
import numpy as np
from joblib import Parallel, delayed
functions=[[abs,np.round] for i in range(500)] # Dummy functions
functions=[function for sublist in functions for function in sublist]
X=np.random.normal(size=(5,5)) # Dummy data
def helper_function(function,X=X):
return function(X)
results = Parallel(n_jobs=-1,)(
map(delayed(helper_function), [functions[i] for i in range(1000)]))
results_out = np.zeros(results[0].shape)
for result in results:
results_out+=result
A solution could be the following modification:
import numpy as np
from joblib import Parallel, delayed
functions=[[abs,np.round] for i in range(500)] # Dummy functions
functions=[function for sublist in functions for function in sublist]
X=np.random.normal(size=(5,5)) # Dummy data
results_out = np.zeros(results[0].shape)
def helper_function(function,X=X,results=results_out):
result = function(X)
results += result
Parallel(n_jobs=-1,)(
map(delayed(helper_function), [functions[i] for i in range(1000)]))
But this might cause races. So it is not optimal.
Do you have any suggestions for preforming this without storing the intermediate results and still make it parallel?