2

I am trying to solve a problem with MPI, openMP and the combination of those two. Both MPI and openMP are running fine but when I try the MPI-openMP version of the code I can't get pass an error shows up at the end of the post.

MPI version

for(i=0;i<loop;i++)               
{
  for(j=start;j<end+1;j++)        
  {
    for(k=0;k<N;k++)              
    {
      if(j!=k)
      {
        dx=C[k*3+0]-C[j*3+0];
        dy=C[k*3+1]-C[j*3+1];
        dz=C[k*3+2]-C[j*3+2];

        d=sqrt(pow(dx,2)+pow(dy,2)+pow(dz,2));

        dCube=pow(d,3);
        mSquare=pow(m,2);

        F[(j-start)*3+0]-=G*mSquare/dCube*dx;
        F[(j-start)*3+1]-=G*mSquare/dCube*dy;
        F[(j-start)*3+2]-=G*mSquare/dCube*dz;
      }
    }
  }
  for(j=0;j<length;j++)                           
  {
    for(k=0;k<3;k++)                               
    {
      a=F[j*3+k]/m;                               
      V[j*3+k]=V[j*3+k]+a*dt;                     
      Cw[j*3+k]=C[(j+start)*3+k]+V[j*3+k]*dt;     
    }
  }
  MPI_Allgatherv(&Cw[0],length*3,MPI_DOUBLE,&C[0],counts,displs,MPI_DOUBLE,MPI_COMM_WORLD);
}
  • loop: describes the amount the the process will take place
  • As the C table needed hole for the right results the start & end variables tell every node witch part of the table are responsible for
  • The MPI_allgather at the end updates the C table with the new values stored in Cw tables before the new i loop starts
  • The counts variable is the size of data that MPI_Allgather will transfer and the displs is in witch position in C every node must put the Cw data

openMP version

#pragma omp parallel private(i,k,dx,dy,dz,d,dCube,mSquare,a)
{
  for(i=0;i<loop;i++)                          
  {
    #pragma omp for schedule(static)
      for(j=0;j<N;j++)                         
      {
        for(k=0;k<N;k++)                       
        {
          if(j!=k)
          {
            dx=C[k*3+0]-C[j*3+0];
            dy=C[k*3+1]-C[j*3+1];
            dz=C[k*3+2]-C[j*3+2];

            d=sqrt(pow(dx,2)+pow(dy,2)+pow(dz,2));

            dCube=pow(d,3);
            mSquare=pow(m,2);

            F[j*3+0]-=G*mSquare/dCube*dx;
            F[j*3+1]-=G*mSquare/dCube*dy;
            F[j*3+2]-=G*mSquare/dCube*dz;
          }
        }
      }
    #pragma omp for schedule(static)
      for(j=0;j<N;j++)                        
      {
        for(k=0;k<3;k++)                      
        {
          a=F[j*3+k]/m;                       
          V[j*3+k]=V[j*3+k]+a*dt;             
          C[j*3+k]=C[j*3+k]+V[j*3+k]*dt;      
        }
      }
  }
}

The i loop cannot be parallel because of the need of an updated C in every loop so the parallel part starts from the j

Combined version

#pragma omp parallel private(i,k,dx,dy,dz,d,dCube,mSquare,a)
  {
    for(i=0;i<loop;i++)                                 
    {
      #pragma omp for schedule(static)
        for(j=start;j<end+1;j++)                        
        {
          for(k=0;k<N;k++)                              
          {
            if(j!=k)
            {
              dx=C[k*3+0]-C[j*3+0];
              dy=C[k*3+1]-C[j*3+1];
              dz=C[k*3+2]-C[j*3+2];

              d=sqrt(pow(dx,2)+pow(dy,2)+pow(dz,2));

              dCube=pow(d,3);
              mSquare=pow(m,2);

              F[(j-start)*3+0]-=G*mSquare/dCube*dx;
              F[(j-start)*3+1]-=G*mSquare/dCube*dy;
              F[(j-start)*3+2]-=G*mSquare/dCube*dz;
            }
          }
        }
      #pragma omp for schedule(static)
        for(j=0;j<length;j++)                           
        {
          for(k=0;k<3;k++)                              
          {
            a=F[j*3+k]/m;                               
            V[j*3+k]=V[j*3+k]+a*dt;                     
            Cw[j*3+k]=C[(j+start)*3+k]+V[j*3+k]*dt;     
          }
        }
      MPI_Allgatherv(&Cw[0],length*3,MPI_DOUBLE,&C[0],counts,displs,MPI_DOUBLE,MPI_COMM_WORLD);
    }
  }

What I've tried to achieve here with MPI is to say in every node with witch part of the C to work on as the MPI approach above and then openMP calculates the Cw tables correspond to this part. Then MPI gather the results with MPI_Allgather in the C table and starts over for loop times. The problem is that when I run the code above this error shows up:

Assertion failed in file src/mpid/ch3/src/ch3u_request.c at line 572: *(&incomplete) >= 0

= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 10389 RUNNING AT felix-desktop
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES

YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

The problem lies int the MPI_Allgather line of code but I can't figure what is the cause of it.

0

1 Answer 1

4

You are calling the MPI_Allgather inside a parallel region. Consequently, that MPI routine is being called using the same communicator (i.e., MPI_COMM_WORLD) by multiple threads, which is not allowed by the MPI standard. Nevertheless, you can ensure that the MPI_Allgather routine is invoked by a single thread:

#pragma omp parallel private(i,k,dx,dy,dz,d,dCube,mSquare,a)
{
  ... 
  #pragma omp master 
  MPI_Allgatherv(&Cw[0],length*3,MPI_DOUBLE,&C[0],counts,displs,MPI_DOUBLE,MPI_COMM_WORLD);
  #pragma omp barrier 
}

By adding the OpenMP clause master one ensures that only one thread calls the MPI_Allgather (i.e., the master thread). To be sure that you do not have race conditions (i.e., threads changing the content that will be used in the MPI_Allgatherv MPI call) an OpenMP barrier was added right after the MPI_Allgather. Hence, all threads from all processes will wait until the master thread of their processes finishes performing the MPI_Allgather call.

If the MPI version that you are using has a thread support level of MPI_THREAD_SERIALIZED, then you can use the constructor single instead of master. Ensure that the level of thread support of your MPI implementation is at least MPI_THREAD_FUNNELED.

Another approach is to move the MPI_Allgather routine to the outside of the parallel region:

for(i=0;i<loop;i++)                                 
{
  #pragma omp parallel private(i,k,dx,dy,dz,d,dCube,mSquare,a){
  ...
  }
  MPI_Allgatherv(&Cw[0],length*3,MPI_DOUBLE,&C[0],counts,displs,MPI_DOUBLE,MPI_COMM_WORLD);
}     

The multiple parallel regions creation overhead should be attenuated by the fact that a decent OpenMP implementation will use a thread pool, hence initializing threads only once.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.