2

I am wishing to use multiprocessing where one of the arguments is a very large numpy array. I’ve researched some other posts appearing to have similar issues

Large numpy arrays in shared memory for multiprocessing: Is sth wrong with this approach?

Share Large, Read-Only Numpy Array Between Multiprocessing Processes

but being rather new to python, I've been having trouble adapting the solutions to this template and in this form. I wonder if I could ask for your help to understand what my options are in order to convey X to the functions in a read-only manner. My simplified snippet of code is here:

import multiprocessing as mp
import numpy as np

def funcA(X):
    # do something with X
    print 'funcA OK'

def funcB(X):
    # do something else with X
    print 'funcB OK'

if __name__=='__main__':
    X=np.random.rand(int(5.00e8),)
    funcA(X) # OK
    funcB(X) # OK
    X=np.random.rand(int(2.65e8),)
    P=[]
    P.append(mp.Process(target=funcA,args=(X,))) # OK
    P.append(mp.Process(target=funcB,args=(X,))) # OK
    for p in P:
        p.start()

    for p in P:
        p.join()

    X=np.random.rand(int(2.70e8),)
    P=[]
    P.append(mp.Process(target=funcA,args=(X,))) # FAIL
    P.append(mp.Process(target=funcB,args=(X,))) # FAIL
    for p in P:
        p.start()

    for p in P:
        p.join()

funcA and funcB appear to accept very large numpy arrays when invoked sequentially. However, if they are invoked as multiprocesses, then there appears to be an upper size limit to the size of the numpy array that can be passed to the function. How could I best get around this?

Note:

0) I do not wish to modify X; only to read from it;

1) I’m running on 64-bit Windows 7 Professional

7
  • is X unique in your code? Or you have to call the two methods on other variables? Commented Jan 29, 2018 at 22:51
  • It happens in Python 2 but not in Python 3 on my machine. Commented Jan 29, 2018 at 23:02
  • @RobertoTrani I plan to call the methods only on X in conjunction with some other, much smaller variables. My motivation here is to because the methods (which are totally independent of one another) each take a long time and I am trying to save time by splitting the workload across multiple processes. I assumed that because X is read-only that it would not be too taxing to set up, but I am finding otherwise. Also, my apologies for misspelling your name below. Commented Jan 30, 2018 at 15:35
  • @cpicke1 Can you check if you are running a 32bit python version? Probably you can check this from the task manager. I haven't your problems on my linux machine, moreover 2.7e8 floats are around 2GB, and this issue seems related to these ones: link1 and link2. Thus, can we consider the problem related to the python version you are running? If so we can improve the solution based on mmap. Commented Jan 31, 2018 at 0:30
  • @RobertoTrani In my Python console, I applied the method found here and the result was 64. Furthermore, I have retained the installer executable that I took from the Anaconda website; it is Anaconda2-4.1.0-Windows-x86_64.exe. I think I do not have the 32bit Python. I would also like to clarify that I am not having any problem declaring or processing 5e8 floats or larger; it's only when I am trying to do something in parallel with them that the problems arise! Commented Jan 31, 2018 at 14:25

1 Answer 1

1

The problem can be in the data transfer to the child processes. When read-only objects must be used I prefer to exploit the copy-on-write mechanism used by the underlying OS to manage the memory of the child processes. However, I don't know if windows 7 uses this mechanism. When copy-on-write is available, you can access area of the parent process without copying them inside the child process. This trick works only if you access them in a read-only way and if the object is created before the creation of the processes.

Summing up, a possible solution (at least for linux machines) is this:

import multiprocessing as mp
import numpy as np

def funcA():
    print "A {}".format(X.shape)
    # do something with the global variable X
    print 'funcA OK'

def funcB():
    print "B {}".format(X.shape)
    # do something else with the global variable X
    print 'funcB OK'

if __name__=='__main__':
    X=np.random.rand(int(5.00e8),)
    funcA() # OK
    funcB() # OK

    X=np.random.rand(int(2.65e8),)
    P=[mp.Process(target=funcA), mp.Process(target=funcB)]
    for p in P:
        p.start()

    for p in P:
        p.join()

    X=np.random.rand(int(2.70e8),)
    P=[mp.Process(target=funcA), mp.Process(target=funcB)]
    for p in P:
        p.start()

    for p in P:
        p.join()

UPDATE: after various comments about compatibility problems with Windows, I sketched a new solution uniquely based on native memory maps. In this solution I am creating a numpy memory map on file, which is shared through the file descriptor, thus it doens't require to copy the whole array inside the childs. I found this solution much faster than using multiprocessing.Array!

UPDATE2: The code below has been updated to avoid memory issues during the randomisation of the memory map.

import multiprocessing as mp
import numpy as np
import tempfile

def funcA(X):
    print "A {}".format(X.shape)
    # do something with X
    print 'funcA OK'

def funcB(X):
    print "B {}".format(X.shape)
    # do something else with X
    print 'funcB OK'

if __name__=='__main__':
    dim = int(2.75e8)
    with tempfile.NamedTemporaryFile(dir='/tmp', delete=False) as tmpfile:
        X = np.memmap(tmpfile, shape=dim, dtype=np.float32)  # create the memory map
        # init the map by chunks of size 1e8
        max_chunk_size = int(1e8)
        for start_pos in range(0, dim, max_chunk_size):
            chunk_size = min(dim-start_pos, max_chunk_size)
            X[start_pos:start_pos+chunk_size] = np.random.rand(chunk_size,)
        P=[mp.Process(target=funcA, args=(X,)), mp.Process(target=funcB, args=(X,))]
        for p in P:
            p.start()

        for p in P:
            p.join()
Sign up to request clarification or add additional context in comments.

11 Comments

On my Windows machine this fails in the debug statement print "A {}".format(X.shape)
I suppose it happen because you are using python 3, while this is python 2 code. I tested the solution on my machine. Moreover, as I said in the answer, this code works only if X is created before creating the child processes (and before calling the two functions)
I switched to a Python 2 interpreter. The error is NameError: global name 'X' is not defined, even if I add a global X statement. (Python 2.7.11)
Are you defining X before calling funcA()? For instance, as I did in the main.
Windows Python does not use fork and hence cannot use the copy-on-write method. On Windows, creating a new Process instance spawns a new empty Python, which then loads up the same program as the original interpreter.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.