4

I am trying to use multiprocessing for the first time and having some fairly basic issues. I have a toy example below, where two processes are adding data to a list:

def add_process(all_nums_class, numbers_to_add):
    for number in numbers_to_add:
        all_nums_class.all_nums_list.append(number)

class AllNumsClass:
    def __init__(self):
        self.all_nums_list = []

all_nums_class = AllNumsClass()

p1 = Process(target=add_process, args=(all_nums_class, [1,3,5]))
p1.start()

p2 = Process(target=add_process, args=(all_nums_class, [2,4,6]))
p2.start()

all_nums_class.all_nums_list

I'd like to have the all_nums_class shared between these processes so that they can both add to its all_nums_list - so the result should be

[1,2,3,4,5,6]

instead of what I'm currently getting which is just good old

[]

Could anybody please advise? I have played around with namespace a bit but I haven't yet made it work here.

I feel I'd better mention (in case it makes a difference) that I'm doing this on Jupyter notebook.

1 Answer 1

10

You can either use a multiprocessing Queue or a Pipe to share data between processes. Queues are both thread and process safe. You will have to be more careful when using a Pipe as the data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.

Currently, your implementation spawns two separate processes each with its own self.all_nums_list. So you're essentially spawning three objects of AllNumsClass: One in your main program, one in p1, and one in p2. Since processes are independent and don't share the same memory space, they are appending correctly but its appending to its own self.all_nums_list for each process. That's why when you print all_nums_class.all_nums_list in your main program, you're printing the main processes' self.all_nums_list which is a empty list. To share the data and have the processes append to the same list, I would recommend using a Queue.

Example using Queue and Process

import multiprocessing as mp

def add_process(queue, numbers_to_add):
    for number in numbers_to_add:
        queue.put(number)

class AllNumsClass:
    def __init__(self):
        self.queue = mp.Queue()
    def get_queue(self):
        return self.queue

if __name__ == '__main__':
    
    all_nums_class = AllNumsClass()

    processes = []
    p1 = mp.Process(target=add_process, args=(all_nums_class.get_queue(), [1,3,5]))
    p2 = mp.Process(target=add_process, args=(all_nums_class.get_queue(), [2,4,6]))

    processes.append(p1)
    processes.append(p2)
    for p in processes:
        p.start()
    for p in processes:
        p.join()

    output = [] 
    while all_nums_class.get_queue().qsize() > 0:
        output.append(all_nums_class.get_queue().get())
    print(output)

This implementation is asynchronous as it does not apply in sequential order. Every time you run it, you may get different outputs.

Example outputs

[1, 2, 3, 5, 4, 6]

[1, 3, 5, 2, 4, 6]

[2, 4, 6, 1, 3, 5]

[2, 1, 4, 3, 5, 6]

A simpler way to maintain an ordered or unordered list of results is to use the mp.Pool class. Specifically, the Pool.apply and the Pool.apply_async functions. Pool.apply will lock the main program until all processes are finished, which is quite useful if we want to obtain results in a particular order for certain applications. In contrast, Pool.apply_async will submit all processes at once and retrieve the results as soon as they are finished. An additional difference is that we need to use the get method after the Pool.apply_async call in order to obtain the return values of the finished processes.

Sign up to request clarification or add additional context in comments.

4 Comments

thanks - I'm fiddling around with this now. My understanding of a queue is that the two processes cannot work on the same class at the same time, so I should be able to get a result of [1,3,5,2,4,6], but not [1,2,3,4,5,6]. Am I correct on this? And if so do you know if there is a way of sharing data between the two whilst they are running?
I provided an example with a class sharing the same mp.Queue. From my understanding, two processes can work on the same queue object simultaneously which is possible since mp.Queue is thread/process safe. I think they are able to share data between the two processes. Personally, I think of a mp.Queue as a global variable that is shared between the two processes but I could be wrong.
- thank you - this has really helped. It seems like mp.Queue is a good way to go.
I usually use mp.Queue when doing multiprocessing stuff but mp.Array is good too

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.