0

I am a new user of openMP, I have a intel i7-2670QM CPU with 8 cores on a linux ubuntu 13.10 system

My program uses nested parallelism in C to create the sum of 8 threads. As I understand it, every thread should run on it's own processor, but when I run the command top on the terminal I see that my program uses only 100% of memory (800% is expected), and in the processor view, only CPU[X] uses 100% (X is random between 0 and 7) and the other CPUs are 0.1%.

When I profile my program with Intel vtune amplifier, it shows that 7 threads were runing, but 6 of them don't use the CPU at all as they were completely IDLE.

When I try another example parallel program the threads split just fine on the cores, so I think the problem is in my code:

#include <omp.h>
void recursive_function(int k)
{
    ........
    recursive_function(...);
}
int main()
{
    omp_set_nested(1);
    #pragma omp parallel for num_threads(4)
    for(i=0;i< width * height;i++)
    {
        #pragma omp critical
        {
            ......
            // 3 simple instructions
        }
        if(i!=0)
        {
            recursive_function(i);
        }
        else
        {
            int j;
            #pragma omp parallel for num_threads(4)
            for(j=i;j< width * height;j++)
            {
                recursive_function(j);
            }
        }
    }
}

execution is made with gcc and the option -fopenmp

3
  • Your processor does not have eight cores. It has four cores and eight hyper-threads (aka logical processors). Commented Oct 23, 2014 at 13:46
  • ok, but even though the program shoul use the whole processor, not just one logical processor Commented Oct 23, 2014 at 14:03
  • You're right. I don't know what's causing your problem based on the information you have provided. Commented Oct 23, 2014 at 19:52

2 Answers 2

2

Have you tried setting GOMP_CPU_AFFINITY?

It may be the scheduler that's not working properly.

EDIT: Changed to GOMP_CPU_AFFINITY as per Haralds comment. He provides a link there as well.

Sign up to request clarification or add additional context in comments.

2 Comments

The KMP_AFFINITY is for the Intel OpenMP runtime, not for GNU runtime. For GNU runtime you have GOMP_CPU_AFFINITY.
Thanks for pointing this out. I forgot about this distinction
1

Notice that you go into the nested region only for i==0.

That means that the outer loop is executed by a team of 4 threads (call it T1). Whenever a thread from T1 executes the iteration i=0 (let's call that thread TH1), TH1 will go into the else and then it will create a parallel region with a team of 4 threads (call it T2). At this moment team T1 has the 3 remaining threads executing the cases where i!=0 and team T2 has 4 threads (including TH1) that executes the innermost parallel region. That sums up to 7 threads.

With respect to threads being idle, that completely depends on the work they have to execute -- that is, recursive_function().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.