0

I'm using a Pool of workers and want each of them to get initialized with a specific object. More precisely, the initialization cannot be parallelized, so that I plan to prepare the objects in the main process before to create the workers and give each worker one of these objects.

Here is my attempt :

import multiprocessing
import random
import time

class Foo:
    def __init__(self, param):
        # NO WAY TO PARALLELIZE THIS !!
        print(f"Creating Foo with {param}")
        self.param = param

    def __call__(self, x):
        time.sleep(1)
        print("Do the computation", self)
        return self.param + str(x)

def initializer():
    global myfoo

    param = random.choice(["a", "b", "c", "d", "e"])
    myfoo = Foo(param)

def compute(x):
    return myfoo(x)

multiple_results = []
with multiprocessing.Pool(2, initializer, ()) as pool:
    for i in range(1, 10):
        work = pool.apply_async(compute, (i,))
        multiple_results.append(work)

    print([res.get(timeout=2) for res in multiple_results])

Here is a possible output:

Creating Foo with b
Creating Foo with a
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
Do the computation <__main__.Foo object at 0x7f8d70aa7fd0>
['b1', 'a2', 'b3', 'a4', 'b5', 'a6', 'b7', 'a8', 'b9']

What is puzzling me is that the address of the Foo object is always the same while the actual Foo object is different as can be seen by the output: "b1", "a2".

My problem is that the two calls to initializer are parallelized while I do not want to parallelize the construction of Foo.

I want some magical method add_worker to do something like this:

pool = multiprocessing.Pool()
for i in range(0,2):
    foo = Foo()
    poo.add_worker(initializer, (foo,))

Any ideas ?

EDIT: I solved my real live problem by making the import of kera's VGGNet inside the process instead of on top of the file. See this answer For the sake of curiosity, I remain interested in an answer.

9
  • Interesting. On Windows I get two separate addresses... Maybe it has to do with difference in memory management between the two systems. Maybe on Linux the addresses are per process's memory area so you get the same address... Commented Jan 7, 2021 at 8:35
  • This is an interesting issue, but I have one question. Why does it bother you? As long as the results are ok... Commented Jan 7, 2021 at 8:39
  • @Tomerikoo Because the initializer function just don't finishes when parallelized. In my real live setting, the Foo object creates a kera.applications.vgg16; maybe I should open a specific question "how to parallelize vgg16" ? Commented Jan 7, 2021 at 8:47
  • 1
    if you want different instances in each worker then create them in compute Commented Jan 7, 2021 at 9:15
  • 1
    @furas if you create a new instance of Foo in compute, every execution of compute will have a different Foo instance, not every worker. So if you have 10 calls of compute with 2 workers, you will have 10 different instances of Foo, while @Laurent Clasessesns seems to want only two of them (one for each worker). If having a different instance of Foo for each execution of compute is fine, then your approach is right. Commented Jan 7, 2021 at 9:26

1 Answer 1

1

As you said, you can see that instances of Foo are actually different for each worker, even if they apparently have the same address.

As explained in answers to this question and this question, you should not rely on addresses to distinguish between instances on different processes: different instances can show the same address.

Sign up to request clarification or add additional context in comments.

2 Comments

I was thinking also about addresses. Old systems CPU (Intel/AMD) could use addresses relative in process instead of absolute addresses and it seems it makes something similar.
Thanks for your answer which clarifies one point. But this does not answer the main question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.