2

To my understanding, multiprocessing uses fork on Linux, which means each process created by multiprocessing has its own memory space and any changes made within do not affect other forked processes.

But I encountered this rather strange situation:

import multiprocessing

i = -1

def change(j):
    global i
    print(i, end=" ")  # should print -1
    i = j 

with multiprocessing.Pool(20) as p:
    p.map(change, range(20))

print(i)  # should print -1

I thought this program would print exactly 21 -1, as multiprocessing creates 20 separate subprocesses whose memory spaces are not shared, which means the line i = j will not affect the value of i in any other processes; hence i = -1 at the time of printing.

However, the program actually printed a mix of -1 and a random amount of numbers between 0 and 19.

Example:

-1 -1 -1 -1 -1 4 -1 5 -1 6 -1 8 -1 -1 14 -1 -1 12 -1 -1 -1

So my question is, why did I not get exactly 21 -1?

1
  • 1
    Although you use Pool(20), it won't really create 20 processes until it is needed. Which means some tasks may be executed in the same process. You can printout os.getpid() to have a check. Commented Feb 24, 2019 at 4:41

3 Answers 3

3

Python 3.2 introduced maxtasksperchild.

Maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

import multiprocessing

i = -1

def change(j):
    global i
    print(i, end=" ")  # should print -1
    i = j 

if __name__ == '__main__':
    with multiprocessing.Pool(20, maxtasksperchild=1) as p:
        p.map(change, range(20))
    print(i)  # should print -1
Sign up to request clarification or add additional context in comments.

Comments

1

Multiprocessing.Pool does not guarantee that each task will be run in a new process. In fact, the reason you'd use multiprocessing.Pool is for tasks where the cost of creating new process is considered expensive, so you want to use a process pool to avoid that process creation overhead. The typical usage pattern for multiprocessing.Pool is that you create lots of tasks and then create a pool with a small number of workers (usually dependant on the number of CPU cores your machine have), the pool will schedule the tasks to the workers and reuse processes when possible. If you want to always start a new process you should have used multiprocessing.Process.

Comments

1

It's a common misbelief that it won't, but Pool(20) will create 20 processes immediately. In fact, the processes will all be started before even the handler-thread gets started, which will feed tasks into the inqueue to be processed by the workers later on.

The processes run multiprocessing.pool.worker-code until they come to .get() from the inqueue. It's just that not all of them will get re-scheduled for getting tasks from the shared queue during the short time this all needs. Queue-reads are sequential, only one process can read from it at a time. Some processes will happen to get multiple tasks while others are not scheduled because your OS runs something different on the cores. It's when a process gets more than one task here when you see other values than -1.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.