0

I have the following code:

pool = Pool(cpu_count())
pool.imap(process_item, items, chunksize=100)

In the process_item() function I am using structures which are resource demanding to create, but it would be reusable. (but not concurrently shareable) Currently within each call of process_item() it creates the resource in a local variable repeatedly. It would be great performance benefit to create once (for each worker) then reuse

Question

How to have delegated cpu_count() instances for those resource, and how to implement the process_item() function to access the appropriate delegated instance belonging that particular worker?

3
  • You can use a factory function, but that would mean you will have to use multiprocess, a fork of multiprocessing, since factory functions aren't picklable by default in python Commented Jul 13, 2022 at 12:19
  • ?...I am already using multiprocessing, pool.imap forks multiple python processes... Commented Jul 13, 2022 at 12:29
  • one is multiprocessing and the other one is multiprocess. They both are identical except for the fact that the one without the ing uses dill, which can pickle more things in general. Are you okay with using libraries outside the builtins? Commented Jul 13, 2022 at 12:38

1 Answer 1

1

If you cannot use anything outside the standard library, I would suggest using using an initializer when creating the pool:

from multiprocessing import Pool, Manager, Process
import os
import random

class A:

    def __init__(self):
        self.var = random.randint(0, 1000)

    def get(self):
        print(self.var, os.getpid())


def worker(some_arg):
    global expensive_var
    expensive_var.get()

def initializer(*args):
    global expensive_var
    expensive_var = A()


if __name__ == "__main__":
    pool = Pool(8, initializer=initializer, initargs=())
    for result in pool.imap(worker, range(100)):
        continue

Create your local variables inside the initializer, and make them global. Then you can use them inside the function you are passing to the pool. This works because the initializer is executed in when each process of the pool starts. So making them global would make it a global variable in the scope of the child process only, allowing access to it during execution of the function you passed to the pool.

There was a stackoverflow answer that explained all this better, but I can't seem to find it for now. But this is basically the gist of it.

Sign up to request clarification or add additional context in comments.

2 Comments

Perhaps this is the answer you're remembering? stackoverflow.com/a/42817946/2112722
@SarahMesser Not the one I was referring to, but it's close enough to offer some context.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.