0

I have code like this:

from multiprocessing import Pool

def do_stuff(idx):
    for i in items[idx:idx+20]:
         # do stuff with idx

items = # a huge nested list
pool = Pool(5)
pool.map(do_stuff, range(0, len(items), 20))
pool.close()
pool.join()

The issue is that threadpool does not share items but rather does create copy for each thread, which is an issue since list is huge and it hogs memory. Is there a way to implement this in a way that items would be shared? found some examples with global that work in basic thread library but that does not seem to apply for multiprocessing lib.

Thanks!

1 Answer 1

1

thread and multiprocessing are not at all interchangeable.

thread still uses the Global Interpreter Lock behind the scenes and thus it is much easier to share variables between threads whereas multiprocessing does not use the GIL and thus can run into conflicts much easier.

A better way to do this would be returning result of do_stuff then compiling the results together.

Look at the documentation here: https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers

In your case it looks like you should use it like this:

from multiprocessing import Pool

def do_stuff(idx):
    for i in items[idx:idx+20]:
         # do stuff with idx

items = # a huge nested list
pool = Pool(5)
multiple_results = [pool.apply_async(do_stuff, i) for i in range(0, len(items), 20)]
multiple_results = [res.get(timeout=1) for res in multiple_results]

edit on basis of comment:

from multiprocessing import Pool

def do_stuff(items):
    for i in items:
         # do stuff with idx

items = # a huge nested list
pool = Pool(5)
pool.map(do_stuff, [x for x in items[::20]]) #generating a list of lists of twenty items for each thread to work on
pool.close()
pool.join()
Sign up to request clarification or add additional context in comments.

12 Comments

There is no writing involved to list it is just read, some calculations happen and results go to DB. I just want to speed it up.
@PapeK24 oh I understand now, in that case this is an XY problem. You should instead be passing the slice of the main list to each thread in the pool, I will update my answer to reflect that.
Yeah, still not a solution. there is a list lookup involved. I just need that list variable to point to a same place in memory somehow.
It may be even impossible for threads in threadpool to share memory if they are literally different processes. not sure how it is implemented.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.