OpenMP: Get total number of running threads

Question

I need to know the total number of threads that my application has spawned via OpenMP. Unfortunately, the omp_get_num_threads() function does not work here since it only yields the number of threads in the current team.

However, my code runs recursively (divide and conquer, basically) and I want to spawn new threads as long as there are still idle processors, but no more.

Is there a way to get around the limitations of omp_get_num_threads and get the total number of running threads?

If more detail is required, consider the following pseudo-code that models my workflow quite closely:

function divide_and_conquer(Job job, int total_num_threads):
  if job.is_leaf(): # Recurrence base case.
    job.process()
    return

  left, right = job.divide()

  current_num_threads = omp_get_num_threads()
  if current_num_threads < total_num_threads: # (1)
    #pragma omp parallel num_threads(2)
      #pragma omp section
        divide_and_conquer(left, total_num_threads)
      #pragma omp section
        divide_and_conquer(right, total_num_threads)

  else:
    divide_and_conquer(left, total_num_threads)
    divide_and_conquer(right, total_num_threads)

  job = merge(left, right)

If I call this code with a total_num_threads value of 4, the conditional annotated with (1) will always evaluate to true (because each thread team will contain at most two threads) and thus the code will always spawn two new threads, no matter how many threads are already running at a higher level.

I am searching for a platform-independent way of determining the total number of threads that are currently running in my application.

You could set OMP_THREAD_LIMIT environment variable to limit the maximum number of OpenMP threads available to a program. — jfs
– jfs, Commented Jan 16, 2011 at 18:07
@J.F. Sebastian: considering the function definition, I guess OP wants a dynamic limit, which can't be provided by the environment variable. — jweyrich
– jweyrich, Commented Jan 16, 2011 at 18:17
@jweyrich: I've commented on the 'I want to spawn new threads as long as there are still idle processors, but no more.' part. The number of CPUs is not very dynamic and the environment variable will do. — jfs
– jfs, Commented Jan 16, 2011 at 23:52

Jonathan Dursi · Accepted Answer · 2011-01-16 18:23:56Z

5

I think there isn't any such routine in at least OpenMP 3; and if there was, I'm not sure it would help, as there's obviously a huge race condition in between the counting of the number of threads and the forking. You could end up overshooting your target number of threads by almost a factor of 2 if everyone sees that there's room for one thread left and then everyone spawns a thread.

If this really is the structure of your program, though, and you just want to limit the total number of threads, there are options (all of these are OpenMP 3.0):

Use the OMP_THREAD_LIMIT environment variable to limit the total number of OpenMP threads
Use OMP_MAX_ACTIVE_LEVELS, or omp_set_max_active_levels(), or test against omp_get_level(), to limit how deeply nested your threads are; if you only want 16 threads, limit to 4 levels of nesting
If you want finer control than powers of two, you can use omp_get_level() to find your level, and call omp_get_ancestor_thread_num(int level) at various levels to find out which thread was your parent, grandparent, etc and from that (using this simple left-right forking) determine a global thread ID. (I think in this case it would go something like ∑_l=0..L-1 a_l 2^L-l where l is the level number starting at 0 and a is the ancestor thread number at that level). This would let you (say) allow threads 0-3 to fork but not 4-7, so that you'd end up with 12 rather than 16 threads. I think this only works in such a regular situation; if each parent thread forked a different number of child threads, I don't think you could determine a unique global thread ID because it looks like you can only query your direct ancestors.

answered Jan 16, 2011 at 18:23

Jonathan Dursi

51.1k10 gold badges131 silver badges160 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Konrad Rudolph Over a year ago

(1) You are right about the race condition. It doesn’t affect correctness though and even with that race condition the performance of capping the the thread number is vastly better than not capping. (2) Your proposed approach to capping the thread number is of course vastly superior to mine. However, I cannot use it since the number of processors in my code has to be user controllable. I’m effectively forced to subvert OpenMP’s thread management. Not ideal, I know, but outside of my control (for the moment).

Jonathan Dursi Over a year ago

Fair enough; sometimes there are other constraints you have to work within.

Konrad Rudolph Over a year ago

your suggestion (3) actually looks incredibly good. Unfortuntely, I also have to support OpenMP 2.5. :-(

ejd · Accepted Answer · 2011-01-17 15:49:46Z

2

The code you have shown has a problem in that an "omp section" has to be within the lexical scope of an "omp sections". I am assuming that you meant the "omp parallel" to be an "omp parallel sections". The other way to do this, is to use "omp task" and then you don't have to keep count of the number of threads. You would just assign the threads to the parallel region and allow the OpenMP implementation to assign the tasks to the threads.

answered Jan 17, 2011 at 15:49

ejd

1,7571 gold badge12 silver badges10 bronze badges

1 Comment

Konrad Rudolph Over a year ago

Unfortunately, OpenMP 2.5 doesn’t have tasks yet. You’re right concerning the sections pragma.

Moehre2 · Accepted Answer · 2024-01-31 17:04:50Z

1

A bit late to the party...

I had a similar problem and solved it with this method (in C):

int get_num_threads(void) {
    int num_threads = 1;
    #pragma omp parallel
    {
        #pragma omp single
        num_threads = omp_get_num_threads();
    }
    return num_threads;
}

It creates a parallel region and lets one thread save the number of threads available in this region. It might not be the fastest approach but it works fine for me.

EDIT:

Actually, I discovered that this is compiler dependent. My proposed method in fact does not work in newer Clang versions. This is my new version to still get the total number of available threads (which is even more ugly than before):

int get_num_threads(void) {
    int num_threads = 0;
    #pragma omp parallel reduction(+:num_threads)
    num_threads += 1;
    return num_threads;
}

(I found this somewhere on the internet but cannot find the site anymore)

edited Jan 31, 2024 at 17:04

answered Aug 9, 2023 at 10:31

Moehre2

113 bronze badges

2 Comments

hofingerandi Over a year ago

The op stated, that omp_get_num_threads() yields the wrong result in his scenario.

Moehre2 Over a year ago

Yes because - as OP pointed out correctly - omp_get_num_threads() only yields the number of threads in the current team. Since in OPs code omp_get_num_threads() is not called within a #pragam omp parallel region it will always return 1. The method above should fix this problem.

jweyrich · Accepted Answer · 2011-01-16 17:19:46Z

-5

Having in mind you know the exact amount of threads being created, the simplest solution I come up with is keeping your own thread counter.

Be aware I'm completely in the dark about OpenMP as I've never really used it.

answered Jan 16, 2011 at 17:19

jweyrich

32.5k5 gold badges70 silver badges101 bronze badges

1 Comment

Victor Eijkhout Over a year ago

Remember that he is creating threads recursively? You'd need the counter to be in shared memory and set it with a critical region. Not a good idea.

Collectives™ on Stack Overflow

OpenMP: Get total number of running threads

4 Answers 4

3 Comments

1 Comment

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

1 Comment

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related