Concurrency and optimization using OpenMP

Question

I'm learning OpenMP. To do so, I'm trying to make an existing code parallel. But I seems to get an worse time when using OpenMP than when I don't.

My inner loop:

    #pragma omp parallel for
    for(unsigned long j = 0; j < c_numberOfElements; ++j)
    {
        //int th_id = omp_get_thread_num();
        //printf("thread %d, j = %d\n", th_id, (int)j);

        Point3D current;
        #pragma omp critical
        {
            current = _points[j];
        }

        Point3D next = getNext(current);

        if (!hasConstraint(next))
        {
            continue;
        }

        #pragma omp critical
        {
            _points[j] = next;
        }
    }

_points is a pointMap_t, defined as:

typedef boost::unordered_map<unsigned long, Point3D> pointMap_t;

Without OpenMP my running time is 44.904s. With OpenMP enabled, on a computer with two cores, it is 64.224s. What I am doing wrong?

High Performance Mark · Accepted Answer · 2010-10-01 17:52:16Z

1

Why have you wrapped your reads and writes to _points[j] in critical sections ? I'm not much of a C++ programmer, but it doesn't look to me as if you need those sections at all. As you've written it (uunamed critical sections) each thread is going to wait while the other goes through each of the sections. This could easily make the program slower.

answered Oct 1, 2010 at 17:52

High Performance Mark

78.7k7 gold badges109 silver badges168 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Vitor Py Over a year ago

It segfaults without it. I don't think I can alter the map from two different threads concurrently.

Steve Townsend · Accepted Answer · 2010-10-01 17:53:55Z

0

It seems possible that the lookup and write to _points in critical sections is dragging down the performance when you use OpenMP. Single-threaded, this will not result in any contention.

Sharing seed data like this seems counterproductive in a parallel programming context. Can you restructure to avoid these contention points?

answered Oct 1, 2010 at 17:53

Steve Townsend

54.4k9 gold badges100 silver badges145 bronze badges

8 Comments

Vitor Py Over a year ago

There is another way to do it? I'm not used to parallel programming.

Steve Townsend Over a year ago

@Vitor - I cannot see why it would fault if you just remove the critical sections. Something in the code we cannot see maybe? _points is not being added to or removed from while the loop executes, right? Do the function calls have side effects outside the current entry? Where is the exception?

Oliver Charlesworth Over a year ago

Even if you remove the critical sections, it may still cause massive slow-down. See e.g. drdobbs.com/go-parallel/article/….

Vitor Py Over a year ago

@Steve Townsend From gdb: [New Thread 0x7ffff0ccd710 (LWP 13329)] *** glibc detected *** /home/vitorpy/openmp/test: double free or corruption (fasttop): 0x00000000027f8be0 ***. bt shows it's occurs on the unordered_map operator[]. Function calls have no side effects.

Vitor Py Over a year ago

@Oli Charlesworth I'll take a look at it. Thank you.

|

florin · Accepted Answer · 2010-10-01 19:48:54Z

0

You need to show the rest of the code. From a comment to another answer, it seems you are using a map. That is really a bad idea, especially if you are mapping 0..n numbers to values: why don't you use an array?

If you really need to use containers, consider using the ones from the Intel's Thread Building Blocks library.

answered Oct 1, 2010 at 19:48

florin

14.4k6 gold badges50 silver badges47 bronze badges

2 Comments

Vitor Py Over a year ago

You're right. The map is mostly a leftover from an older design where ID were not always sequential. I've made it into an array. It's still slow but I think I'm probably bumping into the false sharing issue or something.

florin Over a year ago

@Vitto Py: please try to create a small compilable code sample that shows the problem. You have simplified the code so much that you have abstracted the problem away. One thing to try would be to make "hasContraint" as static inline, to avoid paying the penalty of a function call.

Jonathan Dursi · Accepted Answer · 2010-10-01 21:56:24Z

I agree that it would be best to see some working code.

The ultimate issue here is that there are criticals within a parallel region, and criticals are (a) enormously expensive in and of themselves, and (b) by definition, kill parallelism. The assignment to current certainl doesn't need to be inside a critical, as it is private; I wouldn't have thought the _points[j] assignment would be, either, but I don't know what the map stuff does, so there you go.

But you have a loop in which you have a huge amount of overhead, which grows linearly in the number of threads (the two critical regions) in order to do a tiny amount of actual work (walk along a linked list, it looks like). That's never going to be a good trade...

Collectives™ on Stack Overflow

Concurrency and optimization using OpenMP

4 Answers 4

1 Comment

8 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

8 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related