0

I made this parallel code to share the iterations like first and last, fisrst+1 and last-1,... But I don't know how to improve the code in every one of the two parallel sections because I have an inner loop in the sections and I can't think of any way to simplify it, thanks.

This isn't about which values are stored in x or y, I use this sections design because the requisite is execute the iterations from 0 to N like: 0 N, 1 N-1, 2 N-2 but I would like to know if I can optimize the inner loops maintaining this model

int x = 0, y = 0,k,i,j,h;
#pragma omp parallel private(i, h) reduction(+:x, y)
    {
            #pragma omp sections
            {
                    #pragma omp section
                    {
                            for (i=0; i<N/2; i++)
                            {
                                    C[i] = 0;
                                    for (j=0; j<N; j++)
                                    {
                                        C[i] += MAT[i][j] * B[j];
                                    }
                                    x += C[i];
                            }
                    }
                    #pragma omp section
                    {
                            for (h=N-1; h>=N/2; h--) 
                            {
                                    C[h] = 0;
                                    for (k=0; k<N; k++)
                                    {
                                        C[h] += MAT[h][k] * B[k];
                                    }
                                    y += C[h];
                            }
                    }
            }
    }
    x = x + y;

3 Answers 3

2

Using sections seems like the wrong approach. A pragma omp for seems more appropriate. Also note that you forgot to declare j private.

int x = 0, y = 0,k,i,j;
#pragma omp parallel private(i,j) reduction(+:x, y)
{
#   pragma omp for nowait
    for(i=0; i<N/2; i++) {
        // local variable to make the life easier on the compiler
        int ci = 0;
        for(j=0; j<N; j++)
            ci += MAT[i][j] * B[j];
        x += ci;
        C[i] = ci; 
    }
#   pragma omp for nowait
    for(i=N/2; i < N; i++) {
        int ci = 0;
        for(j=0; j<N; j++)
            ci += MAT[i][j] * B[j];
        y += ci;
        C[i] = ci;
    }
}
x = x + y;

Also, I'm not sure but if you just want x as your final output, you can simplify the code even further:

int x=0, i, j;
#pragma omp parallel for reduction(+:x) private(i,j)
for(i=0; i < N; ++i)
    for(j=0; j < N; ++j)
        x += MAT[i][j] * B[j];
Sign up to request clarification or add additional context in comments.

4 Comments

The problem is that the iterations like first and last, fisrst+1 and last-1,... is a requirement even though there are better alternatives that's why I use the sections without nowait
@JamesR But why? Is your actual algorithm more complicated?
Instead of declaring i,j private, I'd declare them in the loop header: for (int i=whatever).
@VictorEijkhout My C knowledge is a bit rusty but AFAIR that wouldn't be valid C, right? And the post is tagged with C, not C++
1

The section construct is to distribute different tasks to different threads and each section block marks a different task so you will not be able to do that iterations in the order you want I answered you here:

Distribution of loop iterations between threads with a specific order

But I want to clarify that the requirement to use sections is that each block must be independent of the other blocks.

Comments

0

A section gets only one thread, so you can't make the loops parallel. How about

  1. Make a parallel loop to N at the top level,
  2. then inside each iteration use a conditional to decide whether to accumulate into x,y?

Although @Homer512 's solution looks correct to me too.

5 Comments

This isn't about wich values are stored in x or y, I use this sections design because the requisite is execute the iterations from 0 to N like: 0 N, 1 N-1, 2 N-2 but I would like to know if I can optimize the inner loops maintaining this model
Why do iterations need to be executed like that? I see nothing in the code that requires it.
Nothing special, it's a request from my teacher
It doesn't make sense, to insist on a sequential ordering in a parallel program. Anyway, if it really needs to be done in that sequence, then you need to make sure you limit it to two threads, and then your original code is a/the correct solution. But it seems like a pointless exercise to me.
Ithink it's only to make sure we learn how sections work

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.