How does one send custom MPI_Datatype over to a different process?

Question

Suppose that I create custom MPI_Datatypes for subarrays of different sizes on each of the MPI processes allocated to a program. Now I wish to send these subarrays to the master process and assemble them into a bigger array block by block. The master process is unaware of the individual datatypes (defined by the local sizes) on the other processes. Naively, therefore, I might attempt to send over these custom datatypes to the master process in the following manner.

MPI_Datatype localarr_type;
MPI_Type_create_subarray( NDIMS, array_size, local_size, box_corner, MPI_ORDER_C, MPI_FLOAT, &localarr_type );
MPI_Type_Commit(&localarr_type);

if (rank == master)
{
    for (int id = 1; id < nprocs; ++id)
    {
        MPI_Recv( &localarr_type, 1, MPI_Datatype,    id, tag1[id], comm_cart, MPI_STATUS_IGNORE );
        MPI_Recv( big_array,      1, localarray_type, id, tag2[id], comm_cart, MPI_STATUS_IGNORE );
    }
}
else
{
    MPI_Send( &localarr_type, 1, MPI_Datatype,  master, tag1[rank], comm_cart );
    MPI_Send( local_box,      1, localarr_type, master, tag2[rank], comm_cart );
}

However, this results in a compilation error with the following error message from the GNU and CLANG compilers, and the latter error message from the Intel compiler.

/* GNU OR CLANG COMPILER */
error: unexpected type name 'MPI_Datatype': expected expression

/* INTEL COMPILER */
error: type name is not allowed

This means that either (1) I am attempting to send a custom MPI_Datatype over to a different process in the wrong way or that (2) this is not possible at all. I would like to know which it is, and if it is (1), I would like to know what the correct way of communicating a custom MPI_Datatype is. Thank you.

Note.

I am aware of other ways of solving the above problem without needing to communicate MPI_Datatypes. For example, one could communicate the local array sizes and manually reconstruct the MPI_Datatype from other processes inside the master process before using it in the subsequent communication of subarrays. This is not what I am looking for.
I wish to communicate the custom MPI_Datatype itself (as shown in the example above), not something that is an instance of the datatype (which is doable, as also shown in the example code above).

Victor Eijkhout · Accepted Answer · 2021-07-29 19:06:04Z

2

First of all: You can not send a datatype like that. The value MPI_Datatype is not a value of type MPI_Datatype. (It's a cute idea though.) You could send the parameters with which it is constructed, and the reconstruct it on the sending type.

However, you are probably misunderstanding the nature of MPI. In your code, with the same datatype on workers and manager, you are sort of assuming that everyone has data of the same size/shape. That is not compatible with the manager gathering everything together.

If you're gathering data on a manager process (usually not a good idea: are you really sure you need that?) then the contributing processes have the data in a small array, say at index 0..99. So you can send them as an ordinary contiguous buffer. The "manager" has a much larger array, and places all the contributions in disjoint locations. So at most the manager needs to create subarray types to indicate where the received data goes in the big array.

edited Jul 29, 2021 at 19:06

answered Jul 29, 2021 at 18:41

Victor Eijkhout

5,9902 gold badges29 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Nanashi No Gombe Over a year ago

Thank you. I have tried the alternative you suggest and it works (see note 1). The workers don't have the same size/shape boxes; the idea was to send the size/shape information to the manager by sending the custom datatypes from every worker: but it seems that is not possible. Even though I have my answer, I am still curious why you say that gathering data on a manager process is not a good idea. Could you please point me in the right direction so that I can understand what you mean? Much obliged.

Victor Eijkhout Over a year ago

Well, it depends. If each worker has a gigabyte of data, and there are 1000 processes, it's unlikely that the manager has enough space. Furthermore, gathering and doing processing on a manager process is a bottleneck in the overall computing. So you must have a good reason for wanting to gather data. Maybe you do. Do you mind explaining why you are gathering data?

Nanashi No Gombe Over a year ago

I see what you mean. I am updating fields on a lattice (solving an equation of motion), and in the end, I intend to gather the updated fields in the manager to write the final configuration on disk (further analysis to be done with Python). Perhaps it is wiser to write out the subarrays parallelly and then stitch them together later? I don't know.

Victor Eijkhout Over a year ago

You could use MPI I/O to have each process write its part. Then you have only one file, and not stitching needed. MPI I/O is not terribly hard and in fact it's based on the same derived types that you already created above: each type becomes a "view" into the parallel file, and each process writes to its private view.

Nanashi No Gombe Over a year ago

Thanks. I will look into it. :)

Collectives™ on Stack Overflow

How does one send custom MPI_Datatype over to a different process?

Note.

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Note.

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related