I'm trying to implement multiprocessing in my code to make it faster.
To make it easier to understand I will just say the program fits an observed curve using a linear combination of a library of curves and from that measures properties of the observed curve.
I have to do this for over 400 curves and in order to estimate the errors of these properties I perform a Monte Carlo simulation, which means I have to iterate a number of times each calculation.
This takes a lot of time and work, and granted I believe it is a CPU-bound task I figured I'd use multiprocessing in the error estimation step. Here's a simplification of my code:
Without multiprocessing
import numpy as np
import fitting_package
import multiprocessing
def estimate_errors(best_fit_curve, signal_to_noise, fit_kwargs, iterations=100)
results = defaultdict(list)
def fit(best_fit_curve, signal_to_noise, fit_kwargs, results):
# Here noise is added to simulate a new curve (Monte Carlo simulation)
noise = best_fit/signal_to_noise
simulated_curve = np.random.normal(best_fit_curve, noise)
# The arguments from the original fit (outside the error estimation) are passed to the fitting
fit_kwargs.update({'curve' : simulated_curve})
# The fit is performed and it returns the properties packed together
solutions = fitting_package(**fit_kwargs)
# There are more properties so this is a simplification
property_1, property_2 = solutions
aux_dict = {'property_1' : property_1, 'property_2' : property_2}
for key, value in aux_dict.items():
results[key].append(values)
for _ in range(iterations):
fit(best_fit_curve, signal_to_noise, fit_kwargs, results)
return results
With multiprocessing
def estimate_errors(best_fit_curve, signal_to_noise, fit_kwargs, iterations=100)
def fit(best_fit_curve, signal_to_noise, fit_kwargs, queue):
results = queue.get()
noise = best_fit/signal_to_noise
simulated_curve = np.random.normal(best_fit_curve, noise)
fit_kwargs.update({'curve' : simulated_curve})
solutions = fitting_package(**fit_kwargs)
property_1, property_2 = solutions
aux_dict = {'property_1' : property_1, 'property_2' : property_2}
for key, value in aux_dict.items():
results[key].append(values)
queue.put(results)
process_list = []
queue = multiprocessing.Queue()
queue.put(defaultdict(list))
for _ in range(iterations):
process = multiprocessing.Process(target=fit, args=(best_fit_curve, signal_to_noise, fit_kwargs, queue))
process.start()
process_list.append(process)
for p in process_list:
p.join()
results = queue.get()
return results
I thought using multiprocessing would save time, but it actually takes more than double than the other way to do it. Why is this? Is there anyway I can make it faster with multiprocessing?