Segmentation fault when running an OpenMP parallel for loop

Question

For the following code, I only want to parallelize its last part which calculates the second norm of each vector (length of each vector is different) but I am getting an error of segmentation fault. Also, I am not sure whether I am using reduction for the sum in the right place or not.

Another point is that I think I only need to parallelize the outer loop and there is no need to do this for an inner loop. Right?

#include <iostream>
#include <vector>
#include <random>
#include <cmath>
#include <omp.h>
#include <fstream>
#include <cfloat>
#include <chrono>
using namespace std;

int main()
{
    int N = 1000000;       
    unsigned size;
    vector<vector<double>> c;
    default_random_engine g(0);
    uniform_real_distribution<double> d(0.0f, nextafter(1.0f, DBL_MAX));
    vector<double> b;
    
    for (int i = 0; i < N; i++) {
        size = pow(10, i % 4);
        vector<double> a;

        for (int j = 0; j < size; j++) {
            double number = d(g);
            a.push_back(number);
        }

        c.push_back(a);
    }       
    
    int i, j;
    double sum; 

    #pragma omp parallel for num_threads(4) shared(N) private(i,j,c,b) reduction (+: sum)
    for (int i = 0; i <N ; i++) {
       double sum = 0;
       for (int j = 0; j < c[i].size();j++) {
           sum = sum + pow(c[i][j],2);
       }

       double n = sqrt(sum);
       b.push_back(n);
    }
}

Jérôme Richard · Accepted Answer · 2021-03-18 15:47:00Z

3

The segmentation fault is caused by the private clause which does not copy the vectors. It initializes them to default empty vectors. Use firstprivate instead if you want to perform a copy from the "master" thread. That being said, c can be shared here.

Furthermore, here is several important points:

sum must be initialized to 0 (outside the loop);
the sum variable in the scope of the parallel loop shadow the sum variable outside it (the same also apply for i and j);
there is no need to declare a local sum, OpenMP do it for you;
you can move a to avoid unnecessary copies and reserve its size before using it (faster);
N does not need to be shared between thread (it is better to perform a local copy);
since b is private, adding value into it is useless unless it is read locally in each thread (it depends of what you want to do). If you want to read b outside the parallel region, you either need to add a critical section, to merge the thread-local vector parts manually (faster), or to use direct assignment (simplest solution and probably the fastest here).

Here is the corrected code:

#include <iostream>
#include <vector>
#include <random>
#include <cmath>
#include <omp.h>
#include <fstream>
#include <cfloat>
#include <chrono>
using namespace std;

int main()
{
    const int N = 1000000;
    vector<vector<double>> c;
    default_random_engine g(0);
    uniform_real_distribution<double> d(0.0f, nextafter(1.0f, DBL_MAX));
    c.reserve(N);

    for (int i = 0; i < N; i++) {
        const unsigned size = pow(10, i % 4);
        vector<double> a;
        a.reserve(size);

        for (int j = 0; j < size; j++) {
            const double number = d(g);
            a.push_back(number);
        }

        c.push_back(std::move(a));
    }

    double sum = 0.0;
    vector<double> b(N);

    #pragma omp parallel num_threads(4) firstprivate(N) shared(b,c,sum)
    {
        #pragma omp for reduction(+:sum)
        for (int i = 0; i < N ; i++) {
            double sumLocal = 0.0;

            for (int j = 0; j < c[i].size();j++) {
                sumLocal += pow(c[i][j], 2);
            }

            const double n = sqrt(sumLocal);
            b[i] = n;

            sum += sumLocal;
        }
    }
}

edited Mar 18, 2021 at 15:47

answered Mar 17, 2021 at 23:58

Jérôme Richard

53.4k6 gold badges48 silver badges77 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Jérôme Richard Over a year ago

@dreamcrash Actually using firstprivate can be faster in this case because compilers often have issues with constant references (caused by the OpenMP shared): the constant propagation fails since referenced values are not considered as immutable ones and so the value is constantly reloaded in registers. In the worst cases, this prevents the vectorization of the code, resulting in lower performance. You can see this effect with Clang here.

dreamcrash Over a year ago

@JérômeRichard wow, that is pretty interesting, nice to know thanks for the link

MA19 Over a year ago

@JérômeRichard. In the case that I want to parallelize the first part of the code which generates random numbers, I think I should also use firstprivate(N) and shared(a,c) there also. Right?

MA19 Over a year ago

@JérômeRichard I am getting a memory error related to the parallelization of the first part. Can you modify this code or should I ask a new question?

Jérôme Richard Over a year ago

This is a bit more complex in this case as random number engines are mainly sequential so you cannot use 1 global engine here but 1 per thread for exemple. Consequently, the result will be different than sequential ones.

|

Collectives™ on Stack Overflow

Segmentation fault when running an OpenMP parallel for loop

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related