0

I am trying to measure the processing time, or CPU time, of a CPU-intensive computation that has been parallelized with multiprocessing. However, simply bookending the parallelization of the computation with process_time() calls and taking the difference is insufficient to do this. For example, running the MRE

from time import perf_counter as pc, process_time as pt
from multiprocessing import Pool

n_workers = 2

def worker(k):
    total = 0
    for i in range(2*10**7):
        total += i
    return total

if __name__ == '__main__':

    print('Serial computation')
    pc_start = pc()
    pt_start = pt()
    results = n_workers * [0]
    for k in range(n_workers):
        results[k] = worker(k)
    print(f'  Total run time: {pc() - pc_start} seconds')
    print(f'  Total CPU time: {pt() - pt_start} seconds')

    print('Parallel computation')
    pc_start = pc()
    pt_start = pt()
    with Pool() as pool:
        results = pool.map(worker, range(n_workers))
    print(f'  Total run time: {pc() - pc_start} seconds')
    print(f'  Total CPU time: {pt() - pt_start} seconds')

produces

Serial computation
  Total run time: 1.8759662999982538 seconds
  Total CPU time: 1.859375 seconds
Parallel computation
  Total run time: 1.2482177000019874 seconds
  Total CPU time: 0.046875 seconds

In the above output, the run time of each computation captures something like the number of seconds that have ticked by on a stopwatch since the relevant pc_start = pc() call was made. The CPU time appears to represent the CPU time of the current process, excluding the CPU time of child processes of the current process.

Thus, the measured CPU time of the serial computation measures the total CPU time of that computation, but the measured CPU time of the parallel computation measures only the CPU time of the parent process.

In pursuit of the goal of measuring the CPU time of the full parallelized process, I could expand on the above MRE by also measuring the CPU time of each child process and then summing these CPU times of the parent and child processes.

But is this a general solution to the problem? To me, the answer is unclear: Does multiprocessing create additional background processes to manage (e.g., serialize/deserialize) variables and data sent or shared between the parent and child processes? If so, the naive approach of equating the CPU time with the sum of the CPU times of the parent and child processes may fail to include the CPU times of these background processes (which are a part of the parallel computation), making the resulting total CPU time incorrect. But how then should I measure the total CPU time for the parallel computation?


What I have tried:

I have tried many Google searches. In these searches, Google AI struggles with the question. I also was unable to find blog posts, discussion threads, or SO questions on the topic.

0

1 Answer 1

2

You should probably review the documentation to understand how time.process_time() works. Essentially, what you're measuring is the CPU time used by the main program - not the subprocesses. The output from your code is perfectly reasonable

If you want to know the total CPU time used by all subprocesses then you could arrange for each subprocess to time itself and return a value to the main program.

Something like this:

"""
process_time example
"""

from concurrent.futures import ProcessPoolExecutor
from os import process_cpu_count
from time import process_time, monotonic

# number of subprocesses
NP = 2
# number of available CPUs - 1
# if number of CPUs cannot be determined, assume there are 2
# NPROCS must be at least 1
NPROCS = max((process_cpu_count() or 2) - 1, 1)


def process():
    """calculate cpu time for tight loop"""
    start = process_time()
    total = 0
    for i in range(2 * 10**7):
        total += i
    cpu_time = process_time() - start
    return (total, cpu_time)


if __name__ == "__main__":
    real_time_start = monotonic()
    main_start = process_time()
    max_workers = min(NP, NPROCS)
    with ProcessPoolExecutor(max_workers=max_workers) as ppe:
        futures = [ppe.submit(process) for _ in range(NP)]
        total_cpu = 0.0
        for future in futures:
            _, cpu = future.result()
            total_cpu += cpu
    main_cpu_time = process_time() - main_start
    real_time_duration = monotonic() - real_time_start
    print(f"Total CPU time for main program = {main_cpu_time:.4f}s")
    print(f"Total CPU time for {NP} subprocesses = {total_cpu:.4f}s")
    print(f"Mean CPU time per subprocess = {total_cpu/NP:.4f}s")
    print(f"Real time duration of main program = {real_time_duration:.4f}s")

Output:

Total CPU time for main program = 0.0129s
Total CPU time for 2 subprocesses = 1.7267s
Mean CPU time per subprocess = 0.8633s
Real time duration of main program = 1.0391s

Platform:

Python 3.13.5
macOS 15.5
10 core Intel Xeon 3GHz
32GB 2,666MHz DDR4
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the ideas. That link is interesting but does not discuss parent/child processes and the exchange of variables and data between processes, which are central issues here. I have also edited my question to make it clear that the behavior of process_time() is, as you say, perfectly reasonable. Yet I am still unsure how to answer the posted question.
@SapereAude I have made a significant edit to this answer which I hope you will find helpful
This looks like an instructive implementation of the "naive" approach described in the question (worth an upvote for sure). However, the question is whether the naive approach is accurate or not based on how multiprocessing works, and this answer doesn't appear to resolve that. So let me hold off on accepting the answer for now.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.