multiprocessing with large numpy array as argument

Question

I am wishing to use multiprocessing where one of the arguments is a very large numpy array. I’ve researched some other posts appearing to have similar issues

Large numpy arrays in shared memory for multiprocessing: Is sth wrong with this approach?

Share Large, Read-Only Numpy Array Between Multiprocessing Processes

but being rather new to python, I've been having trouble adapting the solutions to this template and in this form. I wonder if I could ask for your help to understand what my options are in order to convey X to the functions in a read-only manner. My simplified snippet of code is here:

import multiprocessing as mp
import numpy as np

def funcA(X):
    # do something with X
    print 'funcA OK'

def funcB(X):
    # do something else with X
    print 'funcB OK'

if __name__=='__main__':
    X=np.random.rand(int(5.00e8),)
    funcA(X) # OK
    funcB(X) # OK
    X=np.random.rand(int(2.65e8),)
    P=[]
    P.append(mp.Process(target=funcA,args=(X,))) # OK
    P.append(mp.Process(target=funcB,args=(X,))) # OK
    for p in P:
        p.start()

    for p in P:
        p.join()

    X=np.random.rand(int(2.70e8),)
    P=[]
    P.append(mp.Process(target=funcA,args=(X,))) # FAIL
    P.append(mp.Process(target=funcB,args=(X,))) # FAIL
    for p in P:
        p.start()

    for p in P:
        p.join()

funcA and funcB appear to accept very large numpy arrays when invoked sequentially. However, if they are invoked as multiprocesses, then there appears to be an upper size limit to the size of the numpy array that can be passed to the function. How could I best get around this?

Note:

0) I do not wish to modify X; only to read from it;

1) I’m running on 64-bit Windows 7 Professional

is X unique in your code? Or you have to call the two methods on other variables? — Roberto Trani
– Roberto Trani, Commented Jan 29, 2018 at 22:51
@RobertoTrani I plan to call the methods only on X in conjunction with some other, much smaller variables. My motivation here is to because the methods (which are totally independent of one another) each take a long time and I am trying to save time by splitting the workload across multiple processes. I assumed that because X is read-only that it would not be too taxing to set up, but I am finding otherwise. Also, my apologies for misspelling your name below. — cpicke1
– cpicke1, Commented Jan 30, 2018 at 15:35
@cpicke1 Can you check if you are running a 32bit python version? Probably you can check this from the task manager. I haven't your problems on my linux machine, moreover 2.7e8 floats are around 2GB, and this issue seems related to these ones: link1 and link2. Thus, can we consider the problem related to the python version you are running? If so we can improve the solution based on mmap. — Roberto Trani
– Roberto Trani, Commented Jan 31, 2018 at 0:30
@RobertoTrani In my Python console, I applied the method found here and the result was 64. Furthermore, I have retained the installer executable that I took from the Anaconda website; it is Anaconda2-4.1.0-Windows-x86_64.exe. I think I do not have the 32bit Python. I would also like to clarify that I am not having any problem declaring or processing 5e8 floats or larger; it's only when I am trying to do something in parallel with them that the problems arise! — cpicke1
– cpicke1, Commented Jan 31, 2018 at 14:25

Roberto Trani · Accepted Answer · 2018-01-31 01:13:44Z

1

The problem can be in the data transfer to the child processes. When read-only objects must be used I prefer to exploit the copy-on-write mechanism used by the underlying OS to manage the memory of the child processes. However, I don't know if windows 7 uses this mechanism. When copy-on-write is available, you can access area of the parent process without copying them inside the child process. This trick works only if you access them in a read-only way and if the object is created before the creation of the processes.

Summing up, a possible solution (at least for linux machines) is this:

import multiprocessing as mp
import numpy as np

def funcA():
    print "A {}".format(X.shape)
    # do something with the global variable X
    print 'funcA OK'

def funcB():
    print "B {}".format(X.shape)
    # do something else with the global variable X
    print 'funcB OK'

if __name__=='__main__':
    X=np.random.rand(int(5.00e8),)
    funcA() # OK
    funcB() # OK

    X=np.random.rand(int(2.65e8),)
    P=[mp.Process(target=funcA), mp.Process(target=funcB)]
    for p in P:
        p.start()

    for p in P:
        p.join()

    X=np.random.rand(int(2.70e8),)
    P=[mp.Process(target=funcA), mp.Process(target=funcB)]
    for p in P:
        p.start()

    for p in P:
        p.join()

UPDATE: after various comments about compatibility problems with Windows, I sketched a new solution uniquely based on native memory maps. In this solution I am creating a numpy memory map on file, which is shared through the file descriptor, thus it doens't require to copy the whole array inside the childs. I found this solution much faster than using multiprocessing.Array!

UPDATE2: The code below has been updated to avoid memory issues during the randomisation of the memory map.

import multiprocessing as mp
import numpy as np
import tempfile

def funcA(X):
    print "A {}".format(X.shape)
    # do something with X
    print 'funcA OK'

def funcB(X):
    print "B {}".format(X.shape)
    # do something else with X
    print 'funcB OK'

if __name__=='__main__':
    dim = int(2.75e8)
    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as tmpfile:
        X = np.memmap(tmpfile, shape=dim, dtype=np.float32)  # create the memory map
        # init the map by chunks of size 1e8
        max_chunk_size = int(1e8)
        for start_pos in range(0, dim, max_chunk_size):
            chunk_size = min(dim-start_pos, max_chunk_size)
            X[start_pos:start_pos+chunk_size] = np.random.rand(chunk_size,)
        P=[mp.Process(target=funcA, args=(X,)), mp.Process(target=funcB, args=(X,))]
        for p in P:
            p.start()

        for p in P:
            p.join()

edited Jan 31, 2018 at 1:13

answered Jan 29, 2018 at 23:01

Roberto Trani

1,23711 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Thomas Weller Over a year ago

On my Windows machine this fails in the debug statement print "A {}".format(X.shape)

Roberto Trani Over a year ago

I suppose it happen because you are using python 3, while this is python 2 code. I tested the solution on my machine. Moreover, as I said in the answer, this code works only if X is created before creating the child processes (and before calling the two functions)

Thomas Weller Over a year ago

I switched to a Python 2 interpreter. The error is NameError: global name 'X' is not defined, even if I add a global X statement. (Python 2.7.11)

Roberto Trani Over a year ago

Are you defining X before calling funcA()? For instance, as I did in the main.

torek Over a year ago

Windows Python does not use fork and hence cannot use the copy-on-write method. On Windows, creating a new Process instance spawns a new empty Python, which then loads up the same program as the original interpreter.

|

Collectives™ on Stack Overflow

multiprocessing with large numpy array as argument

1 Answer 1

11 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related