5

I am trying to use python's multiprocessing library to hopefully gain some performance. Specifically I am using its map function. Now, for some reason when I swap it out with its single processed counterpart I don't get high memory usage. But using the multiprocessing version of map causes my memory to go through the roof. For the record I am doing something which can easily hog up loads of memory, but what would the difference be between the two to cause such a stark difference?

2
  • For the record, this doesn't sound like a memory leak at all, just like memory use. Commented Apr 24, 2010 at 21:48
  • Very true, once again, bad wording on my part. Commented Apr 24, 2010 at 23:51

1 Answer 1

4

You realize that multiprocessing does not use threads, yes? I say this because you mention a "single threaded counterpart".

Are you sending a lot of data through multiprocessing's map? A likely cause is the serialization multiprocessing has to do in many cases. multiprocessing uses pickle, which does typically take up more memory than the data it's pickling. (In some cases, specifically on systems with fork() where new processes are created when you call the map method, it can avoid the serialization, but whenever it needs to send new data to existing process it cannot do so.)

Since with multiprocessing all of the actual work is done in separate processes, the memory of your main process should not be affected by the actual operations you perform. The total use of memory does go up by quite a bit, however, because each worker process has a copy of the data you sent across. This is sometimes copy-on-write memory (in the same cases as not serializing) on systems that have CoW, but Python's use of memory is such that this quickly becomes written to, and thus copied.

Sign up to request clarification or add additional context in comments.

2 Comments

Right, sorry about that I do know that multiprocess doesn't in fact use threads. (Hence the name) So sending the information over the pipe is what is killing it. Makes lots of sense. Do you know of any solutions to the problem that I'm facing?
Send over less data. Or, send it over in smaller chunks. Or, if you're on a system with fork(), make it so the serialization doesn't happen: make sure multiprocessing will start new processes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.