1

I want to transport a struct between processes and for that I am trying to create a MPI struct. The code is for an Ant Colony Optimization (ACO) Algorithm.

The header file with he C struct contains:

    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/time.h>
    #include <math.h>
    #include <mpi.h>

    /* Constants */
    #define NUM_CITIES 100      // Number of cities
    //among others

    typedef struct {
        int city, next_city, tabu[NUM_CITIES], path[NUM_CITIES], path_index;
        double tour_distance;
    } ACO_Ant;

I tried to build my code as suggested in this thread.

Program code:

    int main(int argc, char *argv[])
    {
    MPI_Datatype MPI_TABU, MPI_PATH, MPI_ANT;

    // Initialize MPI
    MPI_Init(&argc, &argv);
    //Determines the size (&procs) of the group associated with a communicator (MPI_COMM_WORLD)
    MPI_Comm_size(MPI_COMM_WORLD, &procs);
    //Determines the rank (&rank) of the calling process in the communicator (MPI_COMM_WORLD)
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);

    MPI_Type_contiguous(NUM_CITIES, MPI_INT, &MPI_TABU);
    MPI_Type_contiguous(NUM_CITIES, MPI_INT, &MPI_PATH);
    MPI_Type_commit(&MPI_TABU);
    MPI_Type_commit(&MPI_PATH);

    // Create ant struct
    //int city, next_city, tabu[NUM_CITIES], path[NUM_CITIES], path_index;
    //double tour_distance;
    int blocklengths[6] = {1,1, NUM_CITIES, NUM_CITIES, 1, 1};
    MPI_Datatype    types[6] = {MPI_INT, MPI_INT, MPI_TABU, MPI_PATH, MPI_INT, MPI_DOUBLE};
    MPI_Aint        offsets[6] = { offsetof( ACO_Ant, city ), offsetof( ACO_Ant, next_city), offsetof( ACO_Ant, tabu), offsetof( ACO_Ant, path ), offsetof( ACO_Ant, path_index ), offsetof( ACO_Ant, tour_distance )};

    MPI_Datatype tmp_type;
    MPI_Aint lb, extent;

    MPI_Type_create_struct(6, blocklengths, offsets, types, &tmp_type);
    MPI_Type_get_extent( tmp_type, &lb, &extent );
    //Tried all of these
    MPI_Type_create_resized( tmp_type, lb, extent, &MPI_ANT );
    //MPI_Type_create_resized( tmp_type, 0, sizeof(MPI_ANT), &MPI_ANT );
    //MPI_Type_create_resized( tmp_type, 0, sizeof(ant), &MPI_ANT );
    MPI_Type_commit(&MPI_ANT);

    printf("Return: %d\n" , MPI_Bcast(ant, NUM_ANTS, MPI_ANT, 0, MPI_COMM_WORLD));
    }

But once the program reaches the MPI_Bcast command, it crashes with Error Code 11, which I presume is MPI_ERR_TOPOLOGY as per this manual. is a segfault (signal 11).

I am also unsure about some of the code why the author of the original program - Can some one explain why they create

MPI_Aint displacements[3];
MPI_Datatype typelist[3];

of size 3, when the struct has 2 variables?

int block_lengths[2];

Code:

    void ACO_Build_best(ACO_Best_tour *tour, MPI_Datatype *mpi_type /*out*/)
    {
        int block_lengths[2];
        MPI_Aint displacements[3];
        MPI_Datatype typelist[3];
        MPI_Aint start_address;
        MPI_Aint address;

        block_lengths[0] = 1;
        block_lengths[1] = NUM_CITIES;

        typelist[0] = MPI_DOUBLE;
        typelist[1] = MPI_INT;

        displacements[0] = 0;

        MPI_Address(&(tour->distance), &start_address);
        MPI_Address(tour->path, &address);
        displacements[1] = address - start_address;

        MPI_Type_struct(2, block_lengths, displacements, typelist, mpi_type);
        MPI_Type_commit(mpi_type);
    }

All and any help will be appreciated.
Edit: help with solving the problem, not marginally useful StackOverflow jargon

5
  • 1
    Please include a Minimal, Complete, and Verifiable example. Commented Apr 14, 2019 at 17:37
  • Edited. Added more code for completeness. Commented Apr 14, 2019 at 17:58
  • Either blocklengths or types is incorrect. Commented Apr 14, 2019 at 22:25
  • 11 is more likely related to the SIGSEGV signal Commented Apr 15, 2019 at 0:04
  • You are correct on both occasions. See Hristo Iliev's answer for more information. Commented Apr 15, 2019 at 13:43

1 Answer 1

1

This part is wrong:

int blocklengths[6] = {1,1, NUM_CITIES, NUM_CITIES, 1, 1};
MPI_Datatype    types[6] = {MPI_INT, MPI_INT, MPI_TABU, MPI_PATH, MPI_INT, MPI_DOUBLE};
MPI_Aint        offsets[6] = { offsetof( ACO_Ant, city ), offsetof( ACO_Ant, next_city), offsetof( ACO_Ant, tabu), offsetof( ACO_Ant, path ), offsetof( ACO_Ant, path_index ), offsetof( ACO_Ant, tour_distance )};

The MPI_TABU and MPI_PATH datatypes already cover NUM_CITIES elements. When you specify the corresponding block size to also be NUM_CITIES, the resultant datatype will try to access NUM_CITIES * NUM_CITIES elements, likely resulting in a segfault (signal 11).

Either set all elements of blocklengths to 1 or replace MPI_TABU and MPI_PATH in the types array with MPI_INT.

This part is also wrong:

MPI_Type_create_struct(6, blocklengths, offsets, types, &tmp_type);
MPI_Type_get_extent( tmp_type, &lb, &extent );
//Tried all of these
MPI_Type_create_resized( tmp_type, lb, extent, &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(MPI_ANT), &MPI_ANT );
//MPI_Type_create_resized( tmp_type, 0, sizeof(ant), &MPI_ANT );
MPI_Type_commit(&MPI_ANT);

Calling MPI_Type_create_resized with the values returned by MPI_Type_get_extent is meaningless since it just duplicates the type without actually resizing it. Using sizeof(MPI_ANT) is wrong since MPI_ANT is not a C type but an MPI handle, which is either an integer index or a pointer (implementation-dependent). It will work with sizeof(ant) if ant is of type ACO_Ant, but given you call MPI_Bcast(ant, NUM_ANTS, ...), then ant is either a pointer, in which case sizeof(ant) is just the pointer size, or it is an array, in which case sizeof(ant) is NUM_ANTS times larger than it must be. The correct call is:

MPI_Type_create_resized(tmp_type, 0, sizeof(ACO_Ant), &ant_type);
MPI_Type_commit(&ant_type);

And please, never use MPI_ as prefix in your own variable or function names. This makes the code unreadable and is very misleading ("is that a predefined MPI datatype or a user-defined one?")

As for the last question, the author might have had a different structure in mind. Nothing stops you from using larger arrays as long as you call MPI_Type_create with the correct number of significant elements.

Note: You don't have to commit MPI datatypes that are never used directly in communication calls. I.e., those two lines are unnecessary:

MPI_Type_commit(&MPI_TABU);
MPI_Type_commit(&MPI_PATH);
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you for the detailed answer! I edited the code as you suggested, and it works. Note that I picked up the MPI_(name) style from the original author, since I wanted my code to be compatible with his - I am using some of the same functions, etc and also because it is my first ever MPI project. I did howeverfound it confusing at times as you said. And yes, I wasn't expecting "MPI_Type_create_resized( tmp_type, 0, sizeof(MPI_ANT), &MPI_ANT );" to work since the mpi type is empty at that point, but I gave it a go anyway, since I was sure the code before that is correct (apparently not).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.