Summation problem when using OpenMP to parrallelize C++ code

Ask Question

Asked 1 year, 3 months ago

Modified 1 year, 3 months ago

Viewed 91 times

I'm trying to use OpenMP with the following C++ code:

double delta(10);
const bool parallel = true;
...
delta = 0;
#pragma omp parallel for collapse(3) reduction(+:delta) if(parallel)
for (unsigned short y1  = 0; y1 < ny + 1; y1++)
    for (unsigned short i = 0; i < delta_directions; i++)
        for (unsigned short k = 0; k < gamma_directions; k++)
            delta += pow(I[y1][i][k] - I_old[y1][i][k], 2);
delta = pow(delta,0.5);
cout << "delta=" << delta << endl;

ny, delta_directions and gamma_directions are declared before as const unsigned short and their values are defined. I and I_old are also defined before as:

double*** I;
I = new double** [ny + 1];
for (unsigned short j(0); j < ny + 1; j++)
    I[j] = new double* [delta_directions];
for (unsigned short j(0); j < ny + 1; j++)
    for (unsigned short k(0); k < delta_directions; k++)
        I[j][k] = new double[gamma_directions]();

double*** I_old;
I_old = new double** [ny + 1];
for (unsigned short j(0); j < ny + 1; j++)
    I_old[j] = new double* [delta_directions];
for (unsigned short j(0); j < ny + 1; j++)
    for (unsigned short k(0); k < delta_directions; k++)
        I_old[j][k] = new double[gamma_directions]();

and their values are defined.

In order to make a Linux executable file, I used a shell file with the following content:

g++ -O3 -std=c++11 -fopenmp main.cpp code/*.cpp -o run.exe

Strangely, when I ran the executable file on a Linux HPC cluster server, I got delta = 0 (wrong answer).

After I deleted the command

#pragma omp parallel for collapse(3) reduction(+:delta) if(parallel)

I got a correct answer.

Even more strangely, It worked well when I used an executable which I prepared for Windows on my Windows PC, it worked well.

Can someone help me to understand what may cause the problem?

Edit: When I tried to write a minimal reproducible example everything worked properly. I'm attaching the code here, but the problem is that my original code doesn't work for specific combinations of delta_directions, ny and gamma_directions.

#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <iomanip>
#include <fstream>
#include <math.h>

using namespace std;

int main()
{
    // file settings
    std::ofstream out("out.txt");
    std::cout.rdbuf(out.rdbuf()); //redirect std::cout to out.txt
    const bool parallel = true;

    // numerical parameters
    const unsigned short gamma_directions = 32;
    const unsigned short delta_directions = 12;
    const unsigned short ny = 40;

    // initializing arrays and variables
    double delta(10);

    double*** I;
    I = new double** [ny + 1];
    for (unsigned short j(0); j < ny + 1; j++)
        I[j] = new double* [delta_directions];
    for (unsigned short j(0); j < ny + 1; j++)
        for (unsigned short k(0); k < delta_directions; k++)
            I[j][k] = new double[gamma_directions]();

    double*** I_old;
    I_old = new double** [ny + 1];
    for (unsigned short j(0); j < ny + 1; j++)
        I_old[j] = new double* [delta_directions];
    for (unsigned short j(0); j < ny + 1; j++)
        for (unsigned short k(0); k < delta_directions; k++)
            I_old[j][k] = new double[gamma_directions]();

    // calculation on I
    for (unsigned short i (0); i < delta_directions; i++)
        for (unsigned short k(0); k < gamma_directions; k++)
            I[0][i][k] = 0.5;

    #pragma omp parallel for collapse(3) if(parallel)
    for (unsigned short y1 = 0; y1 < ny + 1; y1++)
        for (unsigned short i = 0; i < delta_directions; i++)
            for (unsigned short k = 0; k < gamma_directions; k++)
                I_old[y1][i][k] = I[y1][i][k];

    // operations on I
    #pragma omp parallel for collapse(3) if(parallel)
    for (unsigned short y1 = 0; y1 < ny + 1; y1++)
        for (unsigned short i = 0; i < delta_directions; i++)
            for (unsigned short k = 0; k < gamma_directions; k++)
                if (y1==1)
                    I[y1][i][k] = 3;

    // error calculation
    delta = 0;
    #pragma omp parallel for collapse(3) reduction(+:delta) if(parallel)
    for (unsigned short y1  = 0; y1 < ny + 1; y1++)
        for (unsigned short i = 0; i < delta_directions; i++)
            for (unsigned short k = 0; k < gamma_directions; k++)
                delta += pow(I[y1][i][k] - I_old[y1][i][k], 2);
    delta = sqrt(delta);

    std::cout << "delta=" << delta << std::endl;

    // deleting dynamic variables
    delete [] I_old;
    delete [] I;
}

Then I compiled it with: g++ -O3 -std=c++11 -fopenmp main.cpp -o run.exe

edited Aug 21, 2024 at 11:27

asked Aug 21, 2024 at 0:13

Elad

233 bronze badges

1

Edit your question with a minimal reproducible example. We need something we can copy/paste to compile and run locally.

Stephen Newell
– Stephen Newell

2024-08-21 00:26:30 +00:00
Commented Aug 21, 2024 at 0:26
3

All that double and triple star stuff is very bad C++. Allocate a std:;vector of the total lenght, and then use ((i*n)+j)*m+k translation to a linear index. Or learn about C++23 and mdspan.

Victor Eijkhout
– Victor Eijkhout

2024-08-21 00:45:25 +00:00
Commented Aug 21, 2024 at 0:45
3

totally unrelated: rather than pow(delta,0.5); consider using sqrt(delta) a bit more obvious what your intent is and might be faster since sqrt has exactly one job and can often be more easily optimized.

user4581301
– user4581301

2024-08-21 00:48:23 +00:00
Commented Aug 21, 2024 at 0:48
2

Why do you use unsigned short? It may cause overflows once the loop are collapsed, explaining the unexpected 0 result. Not to mention this is not faster to use them (in fact, int or long can be faster on mainstream CPUs). Also please note that using unsigned can decrease performance due to the wrap-arround preventing some compiler optimisation which are possible on signed integers.

Jérôme Richard
– Jérôme Richard

2024-08-21 02:08:27 +00:00
Commented Aug 21, 2024 at 2:08
2

32*128*200 > 256*256. You cannot span the collapsed iteration space with short int, as mentioned by Jerome. Just use int for the iteration variables and your problems will be gone.

Joachim
– Joachim

2024-08-21 14:40:17 +00:00
Commented Aug 21, 2024 at 14:40

| Show 11 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Summation problem when using OpenMP to parrallelize C++ code

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest