Openmp basic Parallelization

Question

I've gotten stuck writing some parallel c code using OpenMP for a concurrency course.

Heres a snippet

#include <stdio.h>
#include <time.h>
#include <math.h>

#define FALSE 0
#define TRUE 1

int count_primes_0(int);
int count_primes_1(int);
int count_primes_2(int);

int main(int argc, char *argv[]){
    int n;

    if (argc != 2){
        printf("Incorrect Invocation, use: \nq1 N");
        return 0;
    } else {
        n = atoi(argv[1]);  
    }

    if (n < 0){
        printf("N cannot be negative");
        return 0;
    }

    printf("N = %d\n", n);

    //omp_set_num_threads(1);
    time_it(count_primes_0, n, "Method 0");
    time_it(count_primes_1, n, "Method 1");
    time_it(count_primes_2, n, "Method 2");

    return 0;
}

int is_prime(int n){
    for(int i = 2; i <= (int)(sqrt((double) n)); i++){
        if ((n % i) == 0){
            return FALSE;
        }
    }

    return n > 1;
}

void time_it( int (*f)(int), int n, char *string){
    clock_t start_clock;
    clock_t end_clock;
    double calc_time;
    int nprimes;

    struct timeval start_val;
    struct timeval end_val;

    start_clock = clock();
    nprimes = (*f)(n);
    end_clock = clock();
    calc_time = ((double)end_clock - (double)start_clock) / CLOCKS_PER_SEC;
    printf("\tNumber of primes: %d \t Time taken: %fs\n\n", nprimes, calc_time);
}

// METHOD 0
// Base Case no parallelization
int count_primes_0(int n){
    int nprimes = 0;

    for(int i = 1; i <= n; i++){
        if (is_prime(i)) {
            nprimes++;
        }
    }

    return nprimes;
}

//METHOD 1
// Use only For and Critical Constructs
int count_primes_1(int n){
    int nprimes = 0;

    #pragma omp parallel for
    for(int i = 1; i <= n; i++){
        if (is_prime(i)) {
            #pragma omp critical
            nprimes++;
        }
    }

    return nprimes;
}

//METHOD 2
// Use Reduction
int count_primes_2(int n){
    int nprimes = 0;

    #pragma omp parallel for reduction(+:nprimes)
    for(int i = 1; i <= n; i++){
        if (is_prime(i)) {
           nprimes++;
        }
    }

    return nprimes;
}

The problem I'm facing is that when I use omp_set_num_threads() the less threads I use the faster my functions run -- or get closer to the runtime of the base unparallelized case

Time Results: These are run on an 8 core machine

8 Threads: Method 0: 0.07s; Method 1: 1.63s; Method 2: 1.4s

4 Threads: Method 0: 0.07s; Method 1: 0.16s; Method 2: 0.16s

2 Threads: Method 0: 0.07s; Method 1: 0.10; Method 2: 0.09

1 Thread: Method 0: 0.07s; Method 1: 0.08s; Method 2: 0.07s

I've tried disabling optimization and using a different gcc version with no difference

Any help is appreciated.

EDIT: Using clock in Linux returns the 'incorrect' time, wall clock time is what I needed so using ether omp_get_wtime() or the Linux function timeit would produce the proper results.

Are you running on a multi-core machine? This code will be CPU-bound (as opposed to e.g. memory-bound or IO-bound), so multithreading will only improve things if it can throw more cores at the problem. — Oliver Charlesworth
– Oliver Charlesworth, Commented Feb 12, 2011 at 17:29
You're not running your experiments for a particularly long time, so it's possible that your OMP times are actually being dominated by the spawning and killing of threads. Try running the entire thing 1000 times, and timing the whole thing. — Oliver Charlesworth
– Oliver Charlesworth, Commented Feb 12, 2011 at 17:40
Tried executing the different methods 1000 times each and then timing how long it all took, the results were very similar to the original single execution results. So I tried a different aproach and converted to longs so as to make the thread creation time negligible but calculating values of n = 10000000 give results M0 = ~30secs; M1=M2= ~60secs — Ciaran Liedeman
– Ciaran Liedeman, Commented Feb 12, 2011 at 18:31

ejd · Accepted Answer · 2011-02-14 16:02:42Z

3

I am surprised that you have seen any success with the program as it is above. If you look at the RedHat Linux man page for clock(), you will see that it "returns an approximation of processor time used by the program". Putting in OpenMP directives causes more overhead, and thus you should see more overall processor time used when you run OpenMP. What you need to look at is elapse time (or wall clock time). When you run in parallel (and you have a program that can benefit from parallel), you will see the elapse time go down. The OpenMP specification defines a routine (omp_get_wtime()) to provide this information.

Changing your program to report using clock() and omp_get_wtime():

$ a.out 1000000 (1,000,000)

2 processors:

clock(): 0.23 wtime(): 0.23 clock(): 0.96 wtime(): 0.16 clock(): 0.59 wtime(): 0.09

4 processors:

clock(): 0.24 wtime(): 0.24 clock(): 0.97 wtime(): 0.16 clock(): 0.57 wtime(): 0.09

8 processors:

clock(): 0.24 wtime(): 0.24 clock(): 2.60 wtime(): 0.26 clock(): 0.64 wtime(): 0.09

$ a.out 10000000 (10,000,000)

2 processors:

clock(): 6.07 wtime(): 6.07 clock(): 10.4 wtime(): 1.78 clock(): 11.3 wtime(): 1.65

4 processors:

clock(): 6.07 wtime(): 6.07 clock(): 11.5 wtime(): 1.71 clock(): 10.7 wtime(): 1.72

8 processors:

clock(): 6.07 wtime(): 6.07 clock(): 9.92 wtime(): 1.83 clock(): 11.9 wtime(): 1.86

edited Feb 14, 2011 at 16:02

answered Feb 14, 2011 at 15:26

ejd

1,7571 gold badge12 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

CharlesB Over a year ago

Maybe the OP is on Windows? In this case clock() returns the wall-time clock

ejd Over a year ago

Interesting since the Microsoft documentation states: "The clock function tells how much processor time the calling process has used".

Ciaran Liedeman Over a year ago

I use Linux, those times are incorrect - there was an error in the code I was testing causing less calls to is_prime. I used the Linux function timeit to obtain correct time - time elapsed as you say. I managed to get a triple in performance for N = 1 000 000 and N = 10 000 000. And thanks for enlightening me about the omp_get_wtime() that will definitely be useful in the future

CharlesB Over a year ago

@Anarci so the code as it is in your question is correctly parallelized?

Ciaran Liedeman Over a year ago

@CharesB Yes, the loops were always parallelized, but clock was returning the time spent in all cores summed together (sort of) -- which got larger as more cores were used -- but the time elapsed was decreasing as the time spent in the cores was in parallel

CharlesB · Accepted Answer · 2011-02-12 17:30:14Z

2

OpenMP does not parallelize loops with function calls inside it, unless arguments are private. A solution would be to inline is_prime() in your loop.

answered Feb 12, 2011 at 17:30

CharlesB

91.2k29 gold badges203 silver badges228 bronze badges

7 Comments

CharlesB Over a year ago

@Oli Charlesworth: does it make sense to declare the loop index as private?

Oliver Charlesworth Over a year ago

@Charles: to be honest, I don't know. I've seen it done, but never really stopped to think what it means...

Ciaran Liedeman Over a year ago

But openmp does not allow branches in the parallel sections so inlining should not be possible? declaring i as private did not make a difference.

Oliver Charlesworth Over a year ago

@Charles: Actually, it turns out that the loop index is implicitly private in a parallel for section (if en.wikipedia.org/wiki/OpenMP#Data_sharing_attribute_clauses is to be believed). On reflection, this is obvious; the value of i will have to be different in each thread, so each must have its own copy!

ejd Over a year ago

@CharlesB - Your original comment about OpenMP not parallelizing loops with function calls inside it is incorrect. Most autopar implementations don't do enough analysis to do this, but OpenMP is directive based and quite "dumb". The compiler is suppose to do what the user tells it with the OpenMP directives - or is suppose to unless an implementation finds a problem with what the user is telling the compiler to do. That is why the user can get themselves in trouble quite easily unless they know what they are doing.

|

Collectives™ on Stack Overflow

Openmp basic Parallelization

2 Answers 2

5 Comments

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related