If multiple threads are simultaneously writing a single memory location.,there will be a race condition,right?? In my case same is happening..
Consider a module from 'reduce.cl'
int i = get_global_id(0);
int n,j;
n = keyMobj[i]; // this n is the key..It can be either 0 or 1.
for(j=0; j<2; j++)
sumMobj[n*2+j] += dataMobj[i].dattr[j]; //summing operation.
Here, The memory locations
sumMobj===> [...0..., ....1...] is accessed 4 threads simultaneously &
sumMobj===> [....3..., ....4...] is accessed 6 threads simultaneously..
Is there any way to still make it parallely,like using locking or semaphore? As this summing is a very big part in my algorithm...