OpenMP, for loop inside section

Question

I would like to run the following code (below). I want to spawn two independent threads, each one would run a parallel for loop. Unfortunately, I get an error. Apparently, parallel for cannot be spawned inside section. How to solve that?

#include <omp.h>
#include "stdio.h"

int main()
{

omp_set_num_threads(10);

#pragma omp parallel    
#pragma omp sections
  {
#pragma omp section
#pragma omp for
    for(int i=0; i<5; i++) {
        printf("x %d\n", i);
    }

#pragma omp section
#pragma omp for
    for(int i=0; i<5; i++) {
        printf(". %d\n", i);
    }
  } // end parallel and end sections
}

And the error:

main.cpp: In function ‘int main()’:
main.cpp:14:9: warning: work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region [enabled by default]
main.cpp:20:9: warning: work-sharing region may not be closely nested inside of work-sharing, critical, ordered, master or explicit task region [enabled by default]

Jakub M. · Accepted Answer · 2011-10-27 16:22:54Z

9

Here you have to use nested parallelism. The problem with the omp for in the sections is that all the threads in scope have to take part in the omp for, and they clearly don't — they're broken up by sections. So you have to introduce functions, and do nested paralleism within the functions.

#include <stdio.h>
#include <omp.h>

void doTask1(const int gtid) {
    omp_set_num_threads(5);
#pragma omp parallel 
    {
        int tid = omp_get_thread_num();
        #pragma omp for
        for(int i=0; i<5; i++) {
            printf("x %d %d %d\n", i, tid, gtid);
        }
    }
}

void doTask2(const int gtid) {
    omp_set_num_threads(5);
#pragma omp parallel 
    {
        int tid = omp_get_thread_num();
        #pragma omp for
        for(int i=0; i<5; i++) {
            printf(". %d %d %d\n", i, tid, gtid);
        }
    }
}


int main()
{
    omp_set_num_threads(2);
    omp_set_nested(1);

#pragma omp parallel    
    {
        int gtid = omp_get_thread_num();
#pragma omp sections
        {
#pragma omp section
            doTask1(gtid);

#pragma omp section
            doTask2(gtid);
        } // end parallel and end sections
    }
}

edited Oct 27, 2011 at 16:22

Jakub M.

34.1k48 gold badges117 silver badges184 bronze badges

answered Oct 27, 2011 at 13:53

Jonathan Dursi

51.1k10 gold badges131 silver badges160 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jonathan Dursi Over a year ago

I feel like I should say that hardcoding in the the number of threads (omp_set_num_thread()), or that nested parallelism is enabled (omp_set_nested()) in the code is somewhere between not-best-practice and kind of obnoxious; you normally want the user to be able to set these with environment variables. They're set here explicitly only for tutorial purposes.

tune2fs · Accepted Answer · 2011-10-27 13:45:54Z

3

OpenMP cannot create parallel regions inside parallel regions. This is due to the fact that OpenMP create at the beginning of the program num_threads parallel threads, in non parallel regions the others are not used and sleep. They have done this, as the frequent generation of new threads is quite slow compared to waking sleeping threads.

Therefore you should parallelize only the loops:

#include <omp.h>
#include "stdio.h"

int main()
{

omp_set_num_threads(10);

#pragma omp parallel for
    for(int i=0; i<5; i++) {
        printf("x %d\n", i);
    }

#pragma omp parallel for
    for(int i=0; i<5; i++) {
        printf(". %d\n", i);
    }
}

edited Oct 27, 2011 at 13:45

answered Oct 27, 2011 at 13:36

tune2fs

7,7237 gold badges44 silver badges57 bronze badges

2 Comments

tune2fs Over a year ago

Your right in this case it should be set to 5, but it should be no problem if it is 10, as the other 5 just do nothing.

Jonathan Dursi Over a year ago

I agree, it doens't break anything.

Alex F · Accepted Answer · 2011-10-27 13:56:08Z

1

Practically, optimal number of threads is equal to number of available CPU cores. So, every parallel for should be handled in all available cores, which is impossible inside of omp sections. So, what you are trying to achieve, is not optimal. tune2fs' suggestion to execute two loops without sections makes sense and gives the best possible performance. You can execute parallel loops inside of another functions, but this "cheating" doesn't give performance boost.

answered Oct 27, 2011 at 13:56

Alex F

43.5k42 gold badges151 silver badges219 bronze badges

2 Comments

Jonathan Dursi Over a year ago

It's completely possible he has a total of 10 expensive independant tasks, consisting of 2 5-iteration loops, and at least 10 cores. In that case, breaking it up this way makes perfect sense, because you can't use more than 5 cores on either loop.

Alex F Over a year ago

@Jonathan Dursi - in this case your suggestion is OK. I thought mostly about long loops - like in image processing, which is main OpenMP specialization.

Collectives™ on Stack Overflow

OpenMP, for loop inside section

3 Answers 3

1 Comment

2 Comments

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related