I'm making a program that takes an integer n and generates the first n Ulam numbers. I followed this guide about OpenMP.
This is the core function, single thread version:
bool isulam (int n, int size) {
int count = 0;
for (int i = 0; i < size; i++)
for (int j = 0; j < size; j++) {
if (i != j && ulam[i]+ulam[j] == n) count++;
if (count > 2) return false;
}
return count;
}
And this is my attempt at optimizing it with OpenMP:
bool isulam (int n, int size) {
int count = 0;
bool toomany = false;
for (int i = 0; i < size; i++)
#pragma omp parallel for reduction (|:toomany)
for (int j = 0; j < size; j++) {
if (i != j && ulam[i]+ulam[j] == n) count++;
if (count > 2) toomany = true;
}
if (count > 2) return false;
return count;
}
I'm compiling with g++ -Ofast -fopenmp. The output is correct, but the OpenMP version is much slower:
$ time ompulam <<< 1000 > /dev/null real 0m22.211s user 0m39.697s sys 0m3.202s $ time ulam <<< 1000 > /dev/null real 0m7.073s user 0m7.017s sys 0m0.008s
What's happening? My CPU is an AMD E1 2500 (2 cores, 1400Mhz) which may not be the best, but I was hoping for a much different result. Is OpenMP only worth on 4+ cores?
FWIW with a regular #pragma omp parallel for (thus without the toomany), the code is running in 18.583s.
timecalculates thetotaltime, although i'm not sure; i still remember once i usedtime make -8jto make and find that the amount is more thantime make(1 job). \$\endgroup\$