22

I have been dabbling with Python's multiprocessing library and although it provides an incredibly easy to use API, it's documentation is not always very clear. In particular, the argument 'maxtasksperchild' passed to an instance of the Pool class I find very confusing.

The following comes directly from Python's documentation (3.7.2):

maxtasksperchild is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default maxtasksperchild is None, which means worker processes will live as long as the pool.

The above raises more questions for me than it answers. Is it bad for a worker process to live as long as the pool? What makes a worker process 'fresh' and when is that desired? In general, when should you set the value for maxtasksperchild explicitly instead of letting it default to 'None' and what are considered best practices in order to maximize processing speed?

From @Darkonaut's amazing answer on chunksize I now understand what chunksize does and represents. Since supplying a value for chunksize impacts the number of 'tasks', I was wondering if there are any considerations that should be made regarding their dependence to ensure maximum performance?

Thanks!

1 Answer 1

35

Normally you don't need to touch this. Sometimes there can arise problems with code calling outside Python leaking memory for example. Limiting the number of tasks a worker-process does before he gets replaced then helps because the "unused resources" he erroneously accumulates are released when the process gets scrapped. Starting a new, "fresh" process then keeps the problem contained. Because replacing a process needs time, for performance you let maxtasksperchild at default. When you run into unexplainable resource problems some day, you can try setting maxtasksperchild=1 to see if this changes something. If it does, it's likely something is leaking something.

Sign up to request clarification or add additional context in comments.

11 Comments

Thank you very much for your quick and clear answer @Darkonaut! I was secretly hoping that YOU would see my question and answer it, since you appear to be the main expert here on SO regarding python's multiprocessing.pool class. Thanks again!
@Marnix.hoh You're welcome! Pretty sure your phrase about the "expert" is not true, but thanks for your feedback ;)
Haha I think you're too modest ;). I have another question actually and I was wondering if you happen to know the answer of top of your head. I want to use the pool.map(), to apply a function on a list of objects, where the function modifies a property on each of the objects. Is there a way to make this work using 'map' or should I use a different method on 'pool'?
@Marnix.hoh The objects will be copied as soon as you use multiprocessing.Pool with any pool-method, so you don't modify the object you have in your parent, you create new objects in your worker processes.
@Marnix.hoh... There will be copying, but that must not be a problem in every scenario. If you need multiple processes modify one and the same complex object, using managers and proxies might be an option, or you look into something like ray.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.