Sharing data between processes in Python

Question

I have a complex data structure (user-defined type) on which a large number of independent calculations are performed. The data structure is basically immutable. I say basically, because though the interface looks immutable, internally some lazy-evaluation is going on. Some of the lazily calculated attributes are stored in dictionaries (return values of costly functions by input parameter). I would like to use Pythons multiprocessing module to parallelize these calculations. There are two questions on my mind.

How do I best share the data-structure between processes?
Is there a way to handle the lazy-evaluation problem without using locks (multiple processes write the same value)?

Thanks in advance for any answers, comments or enlightening questions!

How large / complex are you talking? When an independent calculation is submitted, do you know before the start which lazy attributes are needed? — MattH
– MattH, Commented Aug 10, 2010 at 10:24
The problem is basically a leave-one-out cross-validation on a large set of data-samples. It takes about two hours on my machine on a single core, but I have access to a machine with 24 cores and would like to leverage that power. I do not know in advance which of the attributes will be needed by a single calculation, but I know that eventually (over all calculations) all attributes will be needed, so I could just load them all up front (would have to test that though). — Björn Pollex
– Björn Pollex, Commented Aug 10, 2010 at 10:30

S.Lott · Accepted Answer · 2010-08-10 10:16:46Z

8

How do I best share the data-structure between processes?

Pipelines.

origin.py | process1.py | process2.py | process3.py

Break your program up so that each calculation is a separate process of the following form.

def transform1( piece ):
    Some transformation or calculation.

For testing, you can use it like this.

def t1( iterable ):
    for piece in iterable:
        more_data = transform1( piece )
        yield NewNamedTuple( piece, more_data )

For reproducing the whole calculation in a single process, you can do this.

for x in t1( t2( t3( the_whole_structure ) ) ):
    print( x )

You can wrap each transformation with a little bit of file I/O. Pickle works well for this, but other representations (like JSON or YAML) work well, too.

while True:
    a_piece = pickle.load(sys.stdin)
    more_data = transform1( a_piece )
    pickle.dump( NewNamedTuple( piece, more_data ) )

Each processing step becomes an independent OS-level process. They will run concurrently and will -- immediately -- consume all OS-level resources.

Is there a way to handle the lazy-evaluation problem without using locks (multiple processes write the same value)?

Pipelines.

answered Aug 10, 2010 at 10:16

S.Lott

393k83 gold badges520 silver badges791 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Björn Pollex Over a year ago

Wow, that answers solves two problems that where not even in my question (how to send a complex object to another process, how to do this in python when the multiprocessing-module is not available)!

S.Lott Over a year ago

The point is that OS-level (shared buffer) process management is (a) simpler and (b) can be as fast as more complex multi-threaded, shared-everything techniques.

Amir Over a year ago

@S.Lott I want to share numpy random state of a parent process with a child process. I've tried using Manager but still no luck. Could you please take a look at my question here and see if you can offer a solution? I can still get different random numbers if I do np.random.seed(None) every time that I generate a random number, but this does not allow me to use the random state of the parent process, which is not what I want. Any help is greatly appreciated.

Collectives™ on Stack Overflow

Sharing data between processes in Python

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related