1

I am trying to use the multiprocessing module to parallelize a CPU-intensive piece code over multiple cores. This module looks terrific in several respects, but when I try to pass lists and dictionaries between processes, changes are not always propagated across the processes as I would expect. What are the rules for this? For example, how do I propagate deep changes in nested lists and dictionaries between processes?

Below is a MRE to show a simple instance of an apparent failure of propagation. If I change shared_list in parent_func with .append or .extend, then the data is propagated to child_func, but if I try to change the list by setting it equal to a list outright, then propagation does not occur.

from time import sleep
from multiprocessing import Manager, Process

def parent_func(shared_list):
        sleep(1)
        shared_list.extend(['a','b','c'])  # propagated
        # shared_list = ['a','b','c']  # not propagated

def child_func(shared_list):
    k = 0
    while k < 8:
        k += 1
        print(f'{k}: {shared_list}')
        sleep(0.2)

def main_func():
    with Manager() as manager:
        shared_list = manager.list()

        process = Process(target=parent_func, args=(shared_list,))
        processes = [process]
        process.start()
        process = Process(target=child_func, args=(shared_list,))
        processes.append(process)
        process.start()

        for process in processes:
            process.join()

        print('---')
        print(list(shared_list))

if __name__ == '__main__':
    main_func()

For dictionaries, an example somewhat similar to the above is shown here.


What I have tried:

I have checked the multiprocessing documentation, but could not find much on this question there. As a separate issue, Google AI is currently displaying inline code phrases as empty gray boxes, so I am unable to obtain a Google AI summary on the topic.

6
  • 1
    shared_list = ['a','b','c'] of course that won't be propagated. You aren't "setting a list to another list", you are assigning a new object (a regular list) to the variable shared_list. You need to *mutate the managed list object. There is nothing to propagate because you didn't mutate the managed list object. Commented Jun 25 at 15:51
  • Okay, thanks. But what is the set of code operations w.r.t. shared_list that "mutate" the managed list object versus that merely assign a new list to shared_list? I may be able to guess some of each, but it would be helpful to have someone with more knowledge of the topic to offer an explicit set of each (even if the two sets are not utterly comprehensive, and even if the answer involves basic ideas of reference in Python). I am also very interested in the same for an analogous variable shared_dict, and for lists of dicts and dicts of lists, but I'll take what I can get. Commented Jun 25 at 16:32
  • 1
    Basically, anything that is a method call would be a good rule of thumb here, i.e. my_shared_list.some_method(whatever). Assigning to a variable (simple assignment, e.g. myvar = whatever) never mutates. HOWEVER, using item-assignment, e.g. data[item] = val is actually a method call underneath the hood, data.__setitem__(item, val) Commented Jun 25 at 17:30
  • 1
    So, this is somewhat vaguely described here: docs.python.org/3/library/multiprocessing.html#proxy-objects and the question becomes, basically, "what operations mutate a list/dict?" Commented Jun 25 at 17:34
  • I really appreciate the rules of thumb and link, @juanpa. Cast as an answer and expanded a little, I would accept them. (No pressure though.) Commented Jun 25 at 21:11

1 Answer 1

3

When you write:

shared_list = ['a','b','c']

...all you're doing is assigning a reference to a new list to a local variable called shared_list

However, you could copy into it as follows:

shared_list[:] = ["x","y","z"]

So, here's a complete runnable example:

import multiprocessing as mp


def p1(list_):
    """ subprocess assigns new values to managed list """
    list_[:] = ["x", "y", "z"]


if __name__ == "__main__":
    with mp.Manager() as manager:
        args = [manager.list("abc")]
        print(*args)
        (p := mp.Process(target=p1, args=args)).start()
        p.join()
        print(*args)

Output:

['a', 'b', 'c']

['x', 'y', 'z']

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. This excellent point combined with the comments of juanpa.arrivillaga above seem close to a complete answer to the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.