How to create private arrays in OpenMP

Question

I want to multithread a for loop for some of my C code, but I'm not really sure how to a create a bunch of private arrays for each thread in C. When I use C++. I just give it like; #pragma omp parallel firstprivate(std::vector), but I'm not sure how I do it in C as I pass a ptr that points to the array to the function as an argument. If I were to just do firstprivate(int* array). I would then get a bunch of ptrs that all point to the same array, but I won't get a bunch of threads that all work on their separate arrays.

My problem looks something like;

void pbfs(int n, int* ver, int* edges, int* p, int* dist, int* S, int* T)
{
    int i, j;          // Loop indices
    int v, w;          // Pointers to vertices
    int num_r, num_w;  // Number of vertices in S and T, respectively
    int* temp;        // Temporary pointer

    ...

    while (num_r != 0) {               // Loop until all vertices have been discovered
        for (i = 0; i < num_r; i++) {           // Loop over vertices in S
            v = S[i];                      // Grab next vertex v in S
            for (j = ver[v]; j < ver[v + 1]; j++) { // Go through the neighbors of v
                w = edges[j];                // Get next neighbor w of v
                if (p[w] == -1) {            // Check if w is undiscovered
                    p[w] = v;                  // Set v as the parent of w
                    dist[w] = dist[v] + 1;       // Set distance of w
                    T[num_w++] = w;            // Add w to T and increase number of vertices discovered
                }
            }  // End loop over neighbors of v
        }  // End loop of vertices in S
        temp = S;  // Swap S and T
        S = T;
        T = temp;
        num_r = num_w; // Set number of elements in S
        num_w = 0;     // Set T as empty
    } //  End loop over entire graph
}

I'm thinking that having a pragma for the 2nd for loop would speed up this function. The problem is that T[num_w++] = w; is not safe. Was thinking each thread having their own copy of T, then merge T. I'm not exactly sure how to merge all the T's after either.

The 2nd loop has a very small workload, most probably the parallel overheads will be far much bigger than the gain by parallelization. What is the typical number of vertices/neighbors? If you create a minimal reproducible example (using typical numbers) we can help you how to parallelize your code. — Laci
– Laci, Commented Mar 29, 2022 at 5:36
What is your main goal? To have a fast code or to learn how to use OpenMP? — Laci
– Laci, Commented Mar 29, 2022 at 6:03
My goal is to learn OpenMP, not necessarily having fast code. I'm basically trying to take my BFS code and multithread it and see how it works out :) — MegaMagnum
– MegaMagnum, Commented Mar 29, 2022 at 14:20
Well, you have chosen a very complex problem to study openmp.... — Laci
– Laci, Commented Mar 29, 2022 at 18:57

Laci · Accepted Answer · 2022-03-29 18:10:01Z

2

To protect T[num_w++] = w; inside a parallel loop you have 2 options:

Use atomic operations:

int local_num_w;
#pragma omp atomic capture
local_num_w=num_w++;
T[local_num_w] = w;

Use a local array (local_T) and a local counter (local_num_w) for each thread, fill that local array, and merge all the local arrays to the shared T array.

#pragma omp parallel
{ 
   int local_T[T_SIZE];
   int local_num_w=0;
   #pragma omp for
   for(...) {       
      ....    
      local_T[local_num_w++] = w;            
   }

   int index;   //starting index in T array
   #pragma omp atomic capture
   {index = num_w; num_w += local_num_w;}
   
   //copy the local array to T
   for(int i=0;i<local_num_w;++i)
     T[index+i]=local_T[i];
}

Depending on the actual problem either option 1 or option 2 will be faster. Note that the workload is probably very small in your example, and OpenMP has a significant overhead, so I would not be surprised if the serial version would be the fastest choice in your case. Note also that when you read/write to arrays p and dist you have to use atomic operations to avoid data race.

edited Mar 29, 2022 at 18:10

answered Mar 29, 2022 at 7:32

Laci

2,8381 gold badge15 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

MegaMagnum Over a year ago

I'm not sure if I understood correctly, but trying to implement this myself seems to lock up. This is how my implementation looks like. pastebin.com/U348ABg6

Laci Over a year ago

My question is the following: for a given i the v and w are unique OR for different i can result the same v or w value? In other words is i->v and i->w are injective?

MegaMagnum Over a year ago

For each i we pick a different item in the "queue" or the array here. So for each i. we would have a different v. I'm not 100% sure about w, incase some vertex share edge with another vertex, but I don't think it is. At the end we get a new "queue" with the same operations.

Laci Over a year ago

I think it cannot be parallelized efficiently if w is not different. In this case critical section have to be used: godbolt.org/z/vMqMW3Kze

MegaMagnum Over a year ago

Did you mean to check if (p[w] == -1) { twice?

|

Collectives™ on Stack Overflow

How to create private arrays in OpenMP

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related