0

I'm solving Laplace equation with unstructured mesh partition using MPI. I plan to finish data sending and receiving from neighbor partition first and then do calculation for each processor. MPI_Waitall is used to wait for all MPI_Isend() and MPI_Irecv() finishing, but the problem is all processor pass the MPI_Waitall and stuck there when reading received buffer data because each processor actually didn't receive any data (flag of MPI_Testall returns 0). In my understanding, MPI_Irecv should have received data before MPI_Waitall returns.

double **sbuf = calloc(partition->ptn_nnbr[my_id], sizeof(double *));
double **rbuf = calloc(partition->ptn_nnbr[my_id], sizeof(double *));
for (i = 0; i < partition->ptn_nnbr[my_id]; i++)
{
    //rbuf[i] = calloc(partition->ptn_cnt[my_id][k1], sizeof(double));
    rbuf[i] = calloc(MAX_nnode, sizeof(double));
    sbuf[i] = calloc(MAX_nnode, sizeof(double));
}

nrm = 1;            // nrm = max(abs(r[i])), i = 1..n
iter = 0;
printf("Entering jacobi; iterations = %d, error norm = %e\n", iter, nrm);
while (nrm > TOL && iter<4 ){
    init_boundary_conditions_ptn(x_ptn, mesh, my_id, partition);

    iter++;     
    int req_idx= 0;     
    int idx = 0;
    MPI_Request *request = (MPI_Request *) calloc(2 * partition->ptn_nnbr[my_id], sizeof(MPI_Request));
    MPI_Status *status = calloc(2 * partition->ptn_nnbr[my_id], sizeof(MPI_Status));
    int *flag = calloc(2 * partition->ptn_nnbr[my_id], sizeof(int));
    for (k1 = 0; k1 < partition->nptn; k1++)
    {
        if (partition->ptn_list[my_id][k1] != NULL)
        {               
            for (i = 0; i < partition->ptn_cnt[k1][my_id]; i++)
            {
                sbuf[idx][i] = x_ptn->val[partition->ptn_list[k1][my_id][i] - partition->ptn[my_id] + 1];
            }
            MPI_Isend(sbuf[idx], partition->ptn_cnt[k1][my_id], MPI_DOUBLE, k1, TAG, MPI_COMM_WORLD, &request[req_idx]);
            //printf("isend done from nbr %d for partition %d \n", k1, my_id);
            req_idx++;              
            idx++;
        }
    }

    idx = 0;
    for (k1 = 0; k1 < partition->nptn; k1++)
    {
        if (partition->ptn_list[my_id][k1] != NULL)
        {
            MPI_Irecv(rbuf[idx], partition->ptn_cnt[my_id][k1], MPI_DOUBLE, k1, TAG, MPI_COMM_WORLD, &request[req_idx]);
            //printf("irecv done from nbr %d for partition %d \n", k1, my_id);
            req_idx++;

            idx++;
        }
    }

    printf("partition %d is waiting \n", my_id);
    MPI_Testall(2 * partition->ptn_nnbr[my_id],request,flag, status);
    for (i = 0; i < 2 * partition->ptn_nnbr[my_id]; i++)
    {
        printf("flag[%d] is %d from partition %d\n", i, flag[i], my_id);
    }

    MPI_Waitall(2 * partition->ptn_nnbr[my_id], request, status);
    printf("partition %d pass MPI_Wait \n", my_id);

    for (k1 = 0; k1 < partition->nptn; k1++)
    {
        if (partition->ptn_list[my_id][k1] != NULL)
        {
            MPI_Probe(k1, TAG, MPI_COMM_WORLD, status1);
            MPI_Get_count(status1, MPI_DOUBLE, &count);
            printf("count is %d from nbr %d \n", count, k1);
            for (i = 0; i < count; i++)
            {
                x->val[partition->ptn_list[my_id][k1][i]] = rbuf[idx][i];
            }
        }
    }    

    //printf("exchange complete from partition %d\n", my_id);

    jacobi_step_csr_matrix(A_ptn, x, b_ptn, y_ptn);     // y = inv(D)*(b + (D-A)*x), D = diag(A)

    copy_vector(y_ptn, x_ptn);
    MPI_Gatherv(x_ptn->val, x_ptn->n, MPI_DOUBLE, x->val, x_count, x_dis, MPI_DOUBLE,0, MPI_COMM_WORLD);
    if (my_id == 0)
    {
        init_boundary_conditions(x, mesh, partition->perm);
        matvec_csr_matrix(A, x, r);     // r = A*x
        sxapy(b, -1.0, r);          // r = b - r
        zero_boundary_conditions(r, mesh, partition->perm);
        nrm = norm_inf(r);
    }
    MPI_Bcast(&nrm, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD);
    printf("nrm is %f from partition %d in iter %d \n", nrm, my_id, iter);
    free(request);
    free(status);       

The output is:

Processor 0 start Jacobi 
MAx_node is 2 from partition 0 
Entering jacobi; iterations = 0, error norm = 1.000000e+00
Processor 2 start Jacobi 
MAx_node is 2 from partition 2 
Entering jacobi; iterations = 0, error norm = 1.000000e+00
Processor 3 start Jacobi 
MAx_node is 2 from partition 3 
Entering jacobi; iterations = 0, error norm = 1.000000e+00
Processor 1 start Jacobi 
MAx_node is 2 from partition 1 
Entering jacobi; iterations = 0, error norm = 1.000000e+00
partition 3 is waiting 
flag[0] is 0 from partition 3
flag[1] is 0 from partition 3
flag[2] is 0 from partition 3
flag[3] is 0 from partition 3
partition 3 pass MPI_Wait 
partition 0 is waiting 
flag[0] is 0 from partition 0
flag[1] is 0 from partition 0
flag[2] is 0 from partition 0
flag[3] is 0 from partition 0
partition 0 pass MPI_Wait 
partition 2 is waiting 
flag[0] is 0 from partition 2
flag[1] is 0 from partition 2
flag[2] is 0 from partition 2
flag[3] is 0 from partition 2
partition 2 pass MPI_Wait 
partition 1 is waiting 
flag[0] is 0 from partition 1
flag[1] is 0 from partition 1
flag[2] is 0 from partition 1
flag[3] is 0 from partition 1
partition 1 pass MPI_Wait 

1 Answer 1

1

It appears to me that your understanding of non-blocking communication in MPI is somewhat vague. First of all, you are using the wrong test call. MPI_Testall outputs a scalar completion flag, which indicates whether all requests have completed by the time the call to MPI_Testall was made. If you were to have used MPI_Testsome instead, you would have noticed that only some requests (or more likely none) would have completed. The MPI standard allows for progression of non-blocking operations to be postponed and only progressed in certain occasions. Completion is only guaranteed:

  • after a call to MPI_Wait{all|some|any} (which simply doesn't return before the requests complete);
  • after MPI_Test{all|some|any} returns a true completion flag. There is no guarantee that a single call to MPI_Test... will result in completion - the test functions are meant to be called repeatedly until the flag indicates the completion of the requests.

For performance reasons, most MPI libraries are single-threaded, that is there is no background thread that progresses the non-blocking calls, except on some specific architectures that implement progression in hardware. Therefore, periodic calls into the MPI library are needed in order for non-blocking communications to actually happen and your expectation that all non-blocking requests should have completed by the time you call MPI_Testall is simply wrong.

Also, your program gets stuck in MPI_Probe. It is a blocking call that has to be called before receiving a message, not after. The message has already been received by MPI_Irecv and the probe call is waiting for another message that never arrives. Do not call MPI_Probe. Pass the relevant element of the status array to MPI_Get_count instead.

As a final note, you are passing 2 * partition->ptn_nnbr[my_id] as the number of requests. Make sure that this value actually matches the value accumulated in req_idx, otherwise your program will crash. Inactive requests must be set to MPI_REQUEST_NULL and neither Open MPI nor MPICH uses NULL (as set in your case by the call to calloc(3)) for inactive requests. You should pass req_idx as the number of requests instead.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.