0

I want to multithread a for loop for some of my C code, but I'm not really sure how to a create a bunch of private arrays for each thread in C. When I use C++. I just give it like; #pragma omp parallel firstprivate(std::vector), but I'm not sure how I do it in C as I pass a ptr that points to the array to the function as an argument. If I were to just do firstprivate(int* array). I would then get a bunch of ptrs that all point to the same array, but I won't get a bunch of threads that all work on their separate arrays.

My problem looks something like;

void pbfs(int n, int* ver, int* edges, int* p, int* dist, int* S, int* T)
{
    int i, j;          // Loop indices
    int v, w;          // Pointers to vertices
    int num_r, num_w;  // Number of vertices in S and T, respectively
    int* temp;        // Temporary pointer

    ...

    while (num_r != 0) {               // Loop until all vertices have been discovered
        for (i = 0; i < num_r; i++) {           // Loop over vertices in S
            v = S[i];                      // Grab next vertex v in S
            for (j = ver[v]; j < ver[v + 1]; j++) { // Go through the neighbors of v
                w = edges[j];                // Get next neighbor w of v
                if (p[w] == -1) {            // Check if w is undiscovered
                    p[w] = v;                  // Set v as the parent of w
                    dist[w] = dist[v] + 1;       // Set distance of w
                    T[num_w++] = w;            // Add w to T and increase number of vertices discovered
                }
            }  // End loop over neighbors of v
        }  // End loop of vertices in S
        temp = S;  // Swap S and T
        S = T;
        T = temp;
        num_r = num_w; // Set number of elements in S
        num_w = 0;     // Set T as empty
    } //  End loop over entire graph
}

I'm thinking that having a pragma for the 2nd for loop would speed up this function. The problem is that T[num_w++] = w; is not safe. Was thinking each thread having their own copy of T, then merge T. I'm not exactly sure how to merge all the T's after either.

4
  • The 2nd loop has a very small workload, most probably the parallel overheads will be far much bigger than the gain by parallelization. What is the typical number of vertices/neighbors? If you create a minimal reproducible example (using typical numbers) we can help you how to parallelize your code. Commented Mar 29, 2022 at 5:36
  • What is your main goal? To have a fast code or to learn how to use OpenMP? Commented Mar 29, 2022 at 6:03
  • My goal is to learn OpenMP, not necessarily having fast code. I'm basically trying to take my BFS code and multithread it and see how it works out :) Commented Mar 29, 2022 at 14:20
  • Well, you have chosen a very complex problem to study openmp.... Commented Mar 29, 2022 at 18:57

1 Answer 1

2

To protect T[num_w++] = w; inside a parallel loop you have 2 options:

  1. Use atomic operations:
int local_num_w;
#pragma omp atomic capture
local_num_w=num_w++;
T[local_num_w] = w;            
  1. Use a local array (local_T) and a local counter (local_num_w) for each thread, fill that local array, and merge all the local arrays to the shared T array.
#pragma omp parallel
{ 
   int local_T[T_SIZE];
   int local_num_w=0;
   #pragma omp for
   for(...) {       
      ....    
      local_T[local_num_w++] = w;            
   }

   int index;   //starting index in T array
   #pragma omp atomic capture
   {index = num_w; num_w += local_num_w;}
   
   //copy the local array to T
   for(int i=0;i<local_num_w;++i)
     T[index+i]=local_T[i];
}

Depending on the actual problem either option 1 or option 2 will be faster. Note that the workload is probably very small in your example, and OpenMP has a significant overhead, so I would not be surprised if the serial version would be the fastest choice in your case. Note also that when you read/write to arrays p and dist you have to use atomic operations to avoid data race.

Sign up to request clarification or add additional context in comments.

7 Comments

I'm not sure if I understood correctly, but trying to implement this myself seems to lock up. This is how my implementation looks like. pastebin.com/U348ABg6
My question is the following: for a given i the v and w are unique OR for different i can result the same v or w value? In other words is i->v and i->w are injective?
For each i we pick a different item in the "queue" or the array here. So for each i. we would have a different v. I'm not 100% sure about w, incase some vertex share edge with another vertex, but I don't think it is. At the end we get a new "queue" with the same operations.
I think it cannot be parallelized efficiently if w is not different. In this case critical section have to be used: godbolt.org/z/vMqMW3Kze
Did you mean to check if (p[w] == -1) { twice?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.