I am trying to use the multiprocessing module to parallelize a CPU-intensive piece code over multiple cores. This module looks terrific in several respects, but when I try to pass lists and dictionaries between processes, changes are not always propagated across the processes as I would expect. What are the rules for this? For example, how do I propagate deep changes in nested lists and dictionaries between processes?
Below is a MRE to show a simple instance of an apparent failure of propagation. If I change shared_list in parent_func with .append or .extend, then the data is propagated to child_func, but if I try to change the list by setting it equal to a list outright, then propagation does not occur.
from time import sleep
from multiprocessing import Manager, Process
def parent_func(shared_list):
sleep(1)
shared_list.extend(['a','b','c']) # propagated
# shared_list = ['a','b','c'] # not propagated
def child_func(shared_list):
k = 0
while k < 8:
k += 1
print(f'{k}: {shared_list}')
sleep(0.2)
def main_func():
with Manager() as manager:
shared_list = manager.list()
process = Process(target=parent_func, args=(shared_list,))
processes = [process]
process.start()
process = Process(target=child_func, args=(shared_list,))
processes.append(process)
process.start()
for process in processes:
process.join()
print('---')
print(list(shared_list))
if __name__ == '__main__':
main_func()
For dictionaries, an example somewhat similar to the above is shown here.
What I have tried:
I have checked the multiprocessing documentation, but could not find much on this question there. As a separate issue, Google AI is currently displaying inline code phrases as empty gray boxes, so I am unable to obtain a Google AI summary on the topic.
shared_list = ['a','b','c']of course that won't be propagated. You aren't "setting a list to another list", you are assigning a new object (a regular list) to the variableshared_list. You need to *mutate the managed list object. There is nothing to propagate because you didn't mutate the managed list object.shared_listthat "mutate" the managed list object versus that merely assign a new list toshared_list? I may be able to guess some of each, but it would be helpful to have someone with more knowledge of the topic to offer an explicit set of each (even if the two sets are not utterly comprehensive, and even if the answer involves basic ideas of reference in Python). I am also very interested in the same for an analogous variableshared_dict, and for lists of dicts and dicts of lists, but I'll take what I can get.my_shared_list.some_method(whatever). Assigning to a variable (simple assignment, e.g.myvar = whatever) never mutates. HOWEVER, using item-assignment, e.g.data[item] = valis actually a method call underneath the hood,data.__setitem__(item, val)list/dict?"