0

I got multiple parallel processes writing into one list in python. My code is:

global_list = []
class MyThread(threading.Thread):
    ...
    def run(self):
    results = self.calculate_results()

    global_list.extend(results)


def total_results():
    for param in params:
         t = MyThread(param)
         t.start()
    while threading.active_count() > 1:
        pass
    return total_results

I don't like this aproach as it has:

  1. An overall global variable -> What would be the way to have a local variable for the `total_results function?
  2. The way I check when the list is returned seems somewhat clumsy, what would be the standard way?
6
  • Please note that in your current code you're modifying shared memory (the global list) from more than one thread and you need a mutex/lock around the modification operations. Commented Sep 9, 2014 at 12:33
  • @davecom: that is intuitive but perhaps not really true, given the Python Global Interpreter Lock which already makes it impossible to modify the list from multiple threads even without explicit locking. Commented Sep 9, 2014 at 12:34
  • According to effbot.org/pyfaq/… there is no need for locks as 'extend' is atomic operation (I am using cpython, so locking is not the problem here) Commented Sep 9, 2014 at 12:36
  • I would use RLock() with acquire() and release() Commented Sep 9, 2014 at 12:36
  • @JohnZwinck Good point, but not all Python interpreters have the GIL Commented Sep 9, 2014 at 12:36

2 Answers 2

2

Is your computation CPU-intensive? If so you should look at the multiprocessing module which is included with Python and offers a fairly easy to use Pool class into which you can feed compute tasks and later get all the results. If you need a lot of CPU time this will be faster anyway, because Python doesn't do threading all that well: only a single interpreter thread can run at a time in one process. Multiprocessing sidesteps that (and offers the Pool abstraction which makes your job easier). Oh, and if you really want to stick with threads, multiprocessing has a ThreadPool too.

Sign up to request clarification or add additional context in comments.

1 Comment

My computation ist network-intensive. In this case, performance is not my concern but the poor code design
1

1 - Use a class variable shared between all Worker's instances to append your results

from threading import Thread

class Worker(Thread):
    results = []
    ...

    def run(self):
        results = self.calculate_results()
        Worker.results.extend(results) # extending a list is thread safe

2 - Use join() to wait untill all the threads are done and let them have some computational time

def total_results(params):
    # create all workers
    workers = [Worker(p) for p in params]

    # start all workers
    [w.start() for w in workers]

    # wait for all of them to finish
    [w.join() for w in workers]

    #get the result
    return Worker.results

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.