parallel writing to list in python

Question

I got multiple parallel processes writing into one list in python. My code is:

global_list = []
class MyThread(threading.Thread):
    ...
    def run(self):
    results = self.calculate_results()

    global_list.extend(results)


def total_results():
    for param in params:
         t = MyThread(param)
         t.start()
    while threading.active_count() > 1:
        pass
    return total_results

I don't like this aproach as it has:

An overall global variable -> What would be the way to have a local variable for the `total_results function?
The way I check when the list is returned seems somewhat clumsy, what would be the standard way?

Please note that in your current code you're modifying shared memory (the global list) from more than one thread and you need a mutex/lock around the modification operations. — davecom
– davecom, Commented Sep 9, 2014 at 12:33
@davecom: that is intuitive but perhaps not really true, given the Python Global Interpreter Lock which already makes it impossible to modify the list from multiple threads even without explicit locking. — John Zwinck
– John Zwinck, Commented Sep 9, 2014 at 12:34
According to effbot.org/pyfaq/… there is no need for locks as 'extend' is atomic operation (I am using cpython, so locking is not the problem here) — ProfHase85
– ProfHase85, Commented Sep 9, 2014 at 12:36
@JohnZwinck Good point, but not all Python interpreters have the GIL — davecom
– davecom, Commented Sep 9, 2014 at 12:36

John Zwinck · Accepted Answer · 2014-09-09 12:33:41Z

2

Is your computation CPU-intensive? If so you should look at the multiprocessing module which is included with Python and offers a fairly easy to use Pool class into which you can feed compute tasks and later get all the results. If you need a lot of CPU time this will be faster anyway, because Python doesn't do threading all that well: only a single interpreter thread can run at a time in one process. Multiprocessing sidesteps that (and offers the Pool abstraction which makes your job easier). Oh, and if you really want to stick with threads, multiprocessing has a ThreadPool too.

answered Sep 9, 2014 at 12:33

John Zwinck

252k44 gold badges346 silver badges459 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ProfHase85 Over a year ago

My computation ist network-intensive. In this case, performance is not my concern but the poor code design

David Guyon · Accepted Answer · 2020-05-14 20:37:03Z

1

1 - Use a class variable shared between all Worker's instances to append your results

from threading import Thread

class Worker(Thread):
    results = []
    ...

    def run(self):
        results = self.calculate_results()
        Worker.results.extend(results) # extending a list is thread safe

2 - Use join() to wait untill all the threads are done and let them have some computational time

def total_results(params):
    # create all workers
    workers = [Worker(p) for p in params]

    # start all workers
    [w.start() for w in workers]

    # wait for all of them to finish
    [w.join() for w in workers]

    #get the result
    return Worker.results

edited May 14, 2020 at 20:37

David Guyon

2,9091 gold badge31 silver badges40 bronze badges

answered Sep 9, 2014 at 13:14

sebdelsol

1,0969 silver badges15 bronze badges

Collectives™ on Stack Overflow

parallel writing to list in python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related