2

I was trying to learn something about parallel programming, so I tried to implement Peterson's algorithm for an easy example where one shared counter is incremented by 2 threads. I know that Peterson's isn't optimal due to busy waiting but I tried it only for study reasons.

I supposed that the critical section of this code is in the thread function add where the shared counter is incremented. So i call the enter_section function before the counter incrementation and after it I called the leave_function. Is this part wrong? Did I asses the critical section wrong? Problem is that the counter sometimes gives an unexpectable value when these 2 threads are done. It has to be a synchronization problem between threads but I just don't see it... Thanks for any help.

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

int counter; /* global shared counter */
int flag[2] = {0, 0}; /* Variables for Peterson's algorithm */
int turn = 0;

typedef struct threadArgs 
{
    pthread_t thr_ID;
    int num_of_repeats;
    int id;
} THREADARGS;

void enter_section (int thread) {
    int other = 1 - thread;

    flag[thread] = 1;
    turn = thread;

    while ((turn == thread) && (flag[other] == 1));

    return;
}

void leave_section (int thread) {
    flag[thread] = 0;

    return;
}


void * add (void * arg) {

    int i;
    THREADARGS * a = (THREADARGS *) arg;

    for (i = 0; i < a->num_of_repeats; i++) {
        enter_section(a->id);
        counter++;
        leave_section(a->id);
    }

    return NULL;
}

int main () {
    int i = 1;
    pthread_attr_t thrAttr;
    THREADARGS threadargs_array[2];

    pthread_attr_init (&thrAttr);
    pthread_attr_setdetachstate (&thrAttr, PTHREAD_CREATE_JOINABLE);

    /* creating 1st thread */

    threadargs_array[0].id = 0;
    threadargs_array[0].num_of_repeats = 1000000;

    pthread_create(&threadargs_array[0].thr_ID, &thrAttr, add, &threadargs_array[0]);

    /* creating 2nd thread */

    threadargs_array[1].id = 1;
    threadargs_array[1].num_of_repeats = 2000000;

    pthread_create(&threadargs_array[1].thr_ID, &thrAttr, add, &threadargs_array[1]);

    /* free resources for thread attributes */
    pthread_attr_destroy (&thrAttr);

    /* waiting for 1st thread */
    pthread_join (threadargs_array[0].thr_ID, NULL);
    printf("First thread is done.\n");

    /* waiting for 2nd thread */
    pthread_join (threadargs_array[1].thr_ID, NULL);
    printf("Second thread is done.\n");


    printf("Counter value is: %d \n", counter);
    return (EXIT_SUCCESS);
}
2
  • There's a description of what can happen to Peterson's synchronization at bartoszmilewski.com/2008/11/05/… . Though I could swear an sfence (store fence) would be enough, plus, you need compiler barriers, which I would put at the mfence/sfence, and at each iteration of the while loop. Commented Feb 27, 2012 at 19:49
  • If you're still interested in this, there's another discussion at stackoverflow.com/questions/11588514/… Commented Jul 25, 2012 at 19:26

1 Answer 1

4

You have several problems here:

  • the access to your variables will we subject to optimization, so you'd have to declare them volatile, at least.
  • Algorithms like this that access data between threads without any of the lock data structures that are provided by POSIX can only work if your primitive operations are guaranteed to be atomic, which they usually aren't on modern processors. In particular the ++ operator is not atomic.

There would be several ways around this, in particular the new C standard C11 offers atomic primitives. But if this is really meant for you as a start to learn parallel programming, I'd strongly suggest that you first look into mutexes, condition variables etc, to learn how POSIX is intended to work.

Sign up to request clarification or add additional context in comments.

2 Comments

It would be better to use compiler barriers (__asm__ __volatile__("":::"memory") in GCC) instead of volatile. Anyway, inserting proper CPU and compiler barriers in enter_section and leave_section seems hard, so I wouldn't do it unless I'd had proper sleep.
Better would be to use the proper language constructs from C11 that give the proper guarantees about reordering, and not to trick around with compiler extensions :) In any case, it would be better to stay away from such stuff as an introductory of parallel programming.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.