Python multiprocessing pool: maxtasksperchild

Question

I have been dabbling with Python's multiprocessing library and although it provides an incredibly easy to use API, it's documentation is not always very clear. In particular, the argument 'maxtasksperchild' passed to an instance of the Pool class I find very confusing.

The following comes directly from Python's documentation (3.7.2):

maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

The above raises more questions for me than it answers. Is it bad for a worker process to live as long as the pool? What makes a worker process 'fresh' and when is that desired? In general, when should you set the value for maxtasksperchild explicitly instead of letting it default to 'None' and what are considered best practices in order to maximize processing speed?

From @Darkonaut's amazing answer on chunksize I now understand what chunksize does and represents. Since supplying a value for chunksize impacts the number of 'tasks', I was wondering if there are any considerations that should be made regarding their dependence to ensure maximum performance?

Thanks!

Darkonaut · Accepted Answer · 2019-03-03 23:57:16Z

35

Normally you don't need to touch this. Sometimes there can arise problems with code calling outside Python leaking memory for example. Limiting the number of tasks a worker-process does before he gets replaced then helps because the "unused resources" he erroneously accumulates are released when the process gets scrapped. Starting a new, "fresh" process then keeps the problem contained. Because replacing a process needs time, for performance you let maxtasksperchild at default. When you run into unexplainable resource problems some day, you can try setting maxtasksperchild=1 to see if this changes something. If it does, it's likely something is leaking something.

answered Mar 3, 2019 at 23:57

Darkonaut

21.9k7 gold badges61 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Marnix.hoh Over a year ago

Thank you very much for your quick and clear answer @Darkonaut! I was secretly hoping that YOU would see my question and answer it, since you appear to be the main expert here on SO regarding python's multiprocessing.pool class. Thanks again!

Darkonaut Over a year ago

@Marnix.hoh You're welcome! Pretty sure your phrase about the "expert" is not true, but thanks for your feedback ;)

Marnix.hoh Over a year ago

Haha I think you're too modest ;). I have another question actually and I was wondering if you happen to know the answer of top of your head. I want to use the pool.map(), to apply a function on a list of objects, where the function modifies a property on each of the objects. Is there a way to make this work using 'map' or should I use a different method on 'pool'?

Darkonaut Over a year ago

@Marnix.hoh The objects will be copied as soon as you use multiprocessing.Pool with any pool-method, so you don't modify the object you have in your parent, you create new objects in your worker processes.

Darkonaut Over a year ago

@Marnix.hoh... There will be copying, but that must not be a problem in every scenario. If you need multiple processes modify one and the same complex object, using managers and proxies might be an option, or you look into something like ray.

|

Collectives™ on Stack Overflow

Python multiprocessing pool: maxtasksperchild

1 Answer 1

11 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related