Segmentation fault using mpi4py

Question

I am using mpi4py to spread a processing task over a cluster of cores. My code looks like this:

comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()

'''Perform processing operations with each processor returning 
   two arrays of equal size, array1 and array2'''

all_data1 = comm.gather(array1, root = 0)
all_data2 = comm.gather(array2, root = 0)

This is returning the following error:

SystemError: Negative size passed to PyString_FromStringAndSize

I believe this error means that the array of data stored in all_data1 exceeds the maximum size of an array in Python, which is quite possible.

I tried doing it in smaller pieces, as follows:

comm.isend(array1, dest = 0, tag = rank+1)
comm.isend(array2, dest = 0, tag = rank+2)
if rank == 0:
    for proc in xrange(size):
        partial_array1 = comm.irecv(source = proc, tag = proc+1)
        partial_array2 = comm.irecv(source = proc, tag = proc+2)

but this is returning the following error.

[node10:20210] *** Process received signal ***
[node10:20210] Signal: Segmentation fault (11)
[node10:20210] Signal code: Address not mapped (1)
[node10:20210] Failing at address: 0x2319982b

followed by a whole load of unintelligible path-like information and a final message:

mpirun noticed that process rank 0 with PID 0 on node node10 exited on signal 11 (Segmentation fault).

This seems to happen regardless of how many processors I am using.

For similar questions in C the solution seems to be subtly changing the way the arguments in the recv call are parsed. With Python the syntax is different so I would be grateful if someone could give some clarity to why this error is appearing and how to fix it.

Could you pickle and gzip your array on one side, and gunzip / unpickle it on the other side? — Ashalynd
– Ashalynd, Commented Nov 16, 2015 at 23:00
Correct me if I'm wrong, but isn't that the way mpi4py works to begin with? As I understand it, the data to be communicated is pickled 'under the hood'. — berkelem
– berkelem, Commented Nov 16, 2015 at 23:04
In theory, it should... Can you send anything at all from one side to another? Are all entities in the cluster similar to each other? — Ashalynd
– Ashalynd, Commented Nov 16, 2015 at 23:28
I just tried sending a test object and printing it at the other side, but it is showing up as <mpi4py.MPI.Request object at 0x7f7b125690e0> How do I unpack this object? This may be the cause of the problem. — berkelem
– berkelem, Commented Nov 16, 2015 at 23:49

berkelem · Accepted Answer · 2015-11-18 23:58:04Z

I managed to resolve the problem I was having by doing the following.

if rank != 0:
    comm.Isend([array1, MPI.FLOAT], dest = 0, tag = 77) 
    # Non-blocking send; allows code to continue before data is received.

if rank == 0:
    final_array1 = array1
    for proc in xrange(1,size):
        partial_array1 = np.empty(len(array1), dtype = float)
        comm.Recv([partial_array1, MPI.FLOAT], source = proc, tag = 77) 
        # A blocking receive is necessary here to avoid a Segfault.

        final_array1 += partial_array1

if rank != 0:
    comm.Isend([array2, MPI.FLOAT], dest = 0, tag = 135)

if rank == 0:
    final_array2 = array2
    for proc in xrange(1,size):
        partial_array2 = np.empty(len(array2), dtype = float)
        comm.Recv([partial_array2, MPI.FLOAT], source = proc, tag = 135)

        final_array2 += partial_array2

comm.barrier() # This barrier call resolves the Segfault.
if rank == 0:
    return final_array1, final_array2
else:
    return None

Collectives™ on Stack Overflow

Segmentation fault using mpi4py

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related