I'm trying to use OpenMP with the following C++ code:
double delta(10);
const bool parallel = true;
...
delta = 0;
#pragma omp parallel for collapse(3) reduction(+:delta) if(parallel)
for (unsigned short y1 = 0; y1 < ny + 1; y1++)
for (unsigned short i = 0; i < delta_directions; i++)
for (unsigned short k = 0; k < gamma_directions; k++)
delta += pow(I[y1][i][k] - I_old[y1][i][k], 2);
delta = pow(delta,0.5);
cout << "delta=" << delta << endl;
ny, delta_directions and gamma_directions are declared before as const unsigned short and their values are defined. I and I_old are also defined before as:
double*** I;
I = new double** [ny + 1];
for (unsigned short j(0); j < ny + 1; j++)
I[j] = new double* [delta_directions];
for (unsigned short j(0); j < ny + 1; j++)
for (unsigned short k(0); k < delta_directions; k++)
I[j][k] = new double[gamma_directions]();
double*** I_old;
I_old = new double** [ny + 1];
for (unsigned short j(0); j < ny + 1; j++)
I_old[j] = new double* [delta_directions];
for (unsigned short j(0); j < ny + 1; j++)
for (unsigned short k(0); k < delta_directions; k++)
I_old[j][k] = new double[gamma_directions]();
and their values are defined.
In order to make a Linux executable file, I used a shell file with the following content:
g++ -O3 -std=c++11 -fopenmp main.cpp code/*.cpp -o run.exe
Strangely, when I ran the executable file on a Linux HPC cluster server, I got delta = 0 (wrong answer).
After I deleted the command
#pragma omp parallel for collapse(3) reduction(+:delta) if(parallel)
I got a correct answer.
Even more strangely, It worked well when I used an executable which I prepared for Windows on my Windows PC, it worked well.
Can someone help me to understand what may cause the problem?
Edit: When I tried to write a minimal reproducible example everything worked properly. I'm attaching the code here, but the problem is that my original code doesn't work for specific combinations of delta_directions, ny and gamma_directions.
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <iomanip>
#include <fstream>
#include <math.h>
using namespace std;
int main()
{
// file settings
std::ofstream out("out.txt");
std::cout.rdbuf(out.rdbuf()); //redirect std::cout to out.txt
const bool parallel = true;
// numerical parameters
const unsigned short gamma_directions = 32;
const unsigned short delta_directions = 12;
const unsigned short ny = 40;
// initializing arrays and variables
double delta(10);
double*** I;
I = new double** [ny + 1];
for (unsigned short j(0); j < ny + 1; j++)
I[j] = new double* [delta_directions];
for (unsigned short j(0); j < ny + 1; j++)
for (unsigned short k(0); k < delta_directions; k++)
I[j][k] = new double[gamma_directions]();
double*** I_old;
I_old = new double** [ny + 1];
for (unsigned short j(0); j < ny + 1; j++)
I_old[j] = new double* [delta_directions];
for (unsigned short j(0); j < ny + 1; j++)
for (unsigned short k(0); k < delta_directions; k++)
I_old[j][k] = new double[gamma_directions]();
// calculation on I
for (unsigned short i (0); i < delta_directions; i++)
for (unsigned short k(0); k < gamma_directions; k++)
I[0][i][k] = 0.5;
#pragma omp parallel for collapse(3) if(parallel)
for (unsigned short y1 = 0; y1 < ny + 1; y1++)
for (unsigned short i = 0; i < delta_directions; i++)
for (unsigned short k = 0; k < gamma_directions; k++)
I_old[y1][i][k] = I[y1][i][k];
// operations on I
#pragma omp parallel for collapse(3) if(parallel)
for (unsigned short y1 = 0; y1 < ny + 1; y1++)
for (unsigned short i = 0; i < delta_directions; i++)
for (unsigned short k = 0; k < gamma_directions; k++)
if (y1==1)
I[y1][i][k] = 3;
// error calculation
delta = 0;
#pragma omp parallel for collapse(3) reduction(+:delta) if(parallel)
for (unsigned short y1 = 0; y1 < ny + 1; y1++)
for (unsigned short i = 0; i < delta_directions; i++)
for (unsigned short k = 0; k < gamma_directions; k++)
delta += pow(I[y1][i][k] - I_old[y1][i][k], 2);
delta = sqrt(delta);
std::cout << "delta=" << delta << std::endl;
// deleting dynamic variables
delete [] I_old;
delete [] I;
}
Then I compiled it with: g++ -O3 -std=c++11 -fopenmp main.cpp -o run.exe
std:;vectorof the total lenght, and then use((i*n)+j)*m+ktranslation to a linear index. Or learn about C++23 andmdspan.pow(delta,0.5);consider usingsqrt(delta)a bit more obvious what your intent is and might be faster sincesqrthas exactly one job and can often be more easily optimized.unsigned short? It may cause overflows once the loop are collapsed, explaining the unexpected 0 result. Not to mention this is not faster to use them (in fact,intorlongcan be faster on mainstream CPUs). Also please note that usingunsignedcan decrease performance due to the wrap-arround preventing some compiler optimisation which are possible on signed integers.