I am trying to measure the processing time, or CPU time, of a CPU-intensive computation that has been parallelized with multiprocessing. However, simply bookending the parallelization of the computation with process_time() calls and taking the difference is insufficient to do this. For example, running the MRE
from time import perf_counter as pc, process_time as pt
from multiprocessing import Pool
n_workers = 2
def worker(k):
total = 0
for i in range(2*10**7):
total += i
return total
if __name__ == '__main__':
print('Serial computation')
pc_start = pc()
pt_start = pt()
results = n_workers * [0]
for k in range(n_workers):
results[k] = worker(k)
print(f' Total run time: {pc() - pc_start} seconds')
print(f' Total CPU time: {pt() - pt_start} seconds')
print('Parallel computation')
pc_start = pc()
pt_start = pt()
with Pool() as pool:
results = pool.map(worker, range(n_workers))
print(f' Total run time: {pc() - pc_start} seconds')
print(f' Total CPU time: {pt() - pt_start} seconds')
produces
Serial computation
Total run time: 1.8759662999982538 seconds
Total CPU time: 1.859375 seconds
Parallel computation
Total run time: 1.2482177000019874 seconds
Total CPU time: 0.046875 seconds
In the above output, the run time of each computation captures something like the number of seconds that have ticked by on a stopwatch since the relevant pc_start = pc() call was made. The CPU time appears to represent the CPU time of the current process, excluding the CPU time of child processes of the current process.
Thus, the measured CPU time of the serial computation measures the total CPU time of that computation, but the measured CPU time of the parallel computation measures only the CPU time of the parent process.
In pursuit of the goal of measuring the CPU time of the full parallelized process, I could expand on the above MRE by also measuring the CPU time of each child process and then summing these CPU times of the parent and child processes.
But is this a general solution to the problem? To me, the answer is unclear: Does multiprocessing create additional background processes to manage (e.g., serialize/deserialize) variables and data sent or shared between the parent and child processes? If so, the naive approach of equating the CPU time with the sum of the CPU times of the parent and child processes may fail to include the CPU times of these background processes (which are a part of the parallel computation), making the resulting total CPU time incorrect. But how then should I measure the total CPU time for the parallel computation?
What I have tried:
I have tried many Google searches. In these searches, Google AI struggles with the question. I also was unable to find blog posts, discussion threads, or SO questions on the topic.