Python multiprocessing: not using separate memory space?

Question

To my understanding, multiprocessing uses fork on Linux, which means each process created by multiprocessing has its own memory space and any changes made within do not affect other forked processes.

But I encountered this rather strange situation:

import multiprocessing

i = -1

def change(j):
    global i
    print(i, end=" ")  # should print -1
    i = j 

with multiprocessing.Pool(20) as p:
    p.map(change, range(20))

print(i)  # should print -1

I thought this program would print exactly 21 -1, as multiprocessing creates 20 separate subprocesses whose memory spaces are not shared, which means the line i = j will not affect the value of i in any other processes; hence i = -1 at the time of printing.

However, the program actually printed a mix of -1 and a random amount of numbers between 0 and 19.

Example:

-1 -1 -1 -1 -1 4 -1 5 -1 6 -1 8 -1 -1 14 -1 -1 12 -1 -1 -1

So my question is, why did I not get exactly 21 -1?

Although you use Pool(20), it won't really create 20 processes until it is needed. Which means some tasks may be executed in the same process. You can printout os.getpid() to have a check. — Sraw
– Sraw, Commented Feb 24, 2019 at 4:41

Nizam Mohamed · Accepted Answer · 2019-02-24 05:29:49Z

3

Python 3.2 introduced maxtasksperchild.

Maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

import multiprocessing

i = -1

def change(j):
    global i
    print(i, end=" ")  # should print -1
    i = j 

if __name__ == '__main__':
    with multiprocessing.Pool(20, maxtasksperchild=1) as p:
        p.map(change, range(20))
    print(i)  # should print -1

answered Feb 24, 2019 at 5:29

Nizam Mohamed

9,31026 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Lie Ryan · Accepted Answer · 2019-02-24 05:01:15Z

1

Multiprocessing.Pool does not guarantee that each task will be run in a new process. In fact, the reason you'd use multiprocessing.Pool is for tasks where the cost of creating new process is considered expensive, so you want to use a process pool to avoid that process creation overhead. The typical usage pattern for multiprocessing.Pool is that you create lots of tasks and then create a pool with a small number of workers (usually dependant on the number of CPU cores your machine have), the pool will schedule the tasks to the workers and reuse processes when possible. If you want to always start a new process you should have used multiprocessing.Process.

edited Feb 24, 2019 at 5:01

answered Feb 24, 2019 at 4:55

Lie Ryan

65.3k14 gold badges103 silver badges151 bronze badges

Comments

Darkonaut · Accepted Answer · 2019-02-24 05:33:11Z

It's a common misbelief that it won't, but Pool(20) will create 20 processes immediately. In fact, the processes will all be started before even the handler-thread gets started, which will feed tasks into the inqueue to be processed by the workers later on.

The processes run multiprocessing.pool.worker-code until they come to .get() from the inqueue. It's just that not all of them will get re-scheduled for getting tasks from the shared queue during the short time this all needs. Queue-reads are sequential, only one process can read from it at a time. Some processes will happen to get multiple tasks while others are not scheduled because your OS runs something different on the cores. It's when a process gets more than one task here when you see other values than -1.

Collectives™ on Stack Overflow

Python multiprocessing: not using separate memory space?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related