1

I have a simple program that processes things read in from a file. There are many things and processing is not very fast and requires a huge data structure for looking up and calculating, so running this takes a lot of time. This should be trivially parallel, but I am not sure how to best do this in Python (using Pythong 3.5 or later).

I would like to learn how to best move towards a parallel implementation. Ideally I would have a number of workers which all share the huge data structure and then read in the things and assign them round-robin to each worker in a queue. The workers would do their slow processing using the (shared) huge data structure and each store the result somehow. Once all the workers have finished, I would collect each set of results and combine them in a sequential fashion.

Is something like this possible in Python or what else would be the best way to do this in Python? I have tried but failed to understand the concurrent.futures module enough to see how I could implement a solution with it.

What I find puzzling that it is not clear what will happen "under the hood". If the callable to run for each thing is not purely functional but requires (huge) data to first get loaded, how to best achieve this? I assume I cannot simply pass along a reference to the data? Or can I? Would it work for multithreading but not multiprocessing? But would using multithreading even speed up my problem?

2
  • Everything is possible with Python. Perhaps you should start with simpler concurrent problem to get the idea of how to do it. You defiantly should look into Generators if you work with big source of data. Commented Mar 26, 2018 at 13:05
  • 1
    I asked a similar question a few years ago: stackoverflow.com/questions/20640840/… Commented Mar 26, 2018 at 13:19

1 Answer 1

1

For instance, if you don't really care which thread will treat a part of your data you can share a Queue between the threads, and loop on every thread while the queue is not empty.

class Test(Thread):

    def __init__(self, q, i):
        super().__init__()
        self.q = q
        self.id = i

    def run(self):
        while not self.q.empty():
            print(str(self.q.get()) + 'treated by thread ' + str(self.id))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.