I am using Visual Studio 2012 and building on x64 release mode. The following code takes 33.5% of the time my program needs to run. I used the visual studio profiler to measure it.
//every variable is unsigned int or unsigned int*
for(unsigned int i = 0; i < num; i++)
{
unique[ids[i]]++;//2.1%
total[ids[i]] += main_list[id];//31.4%
}
Can someone please recommended a way to decrease the time this function takes to run?
edit: Based on your input I tried the following code:
const unsigned int now = main_list[id];
for(unsigned int i = ids[0], j = 0; j < num; j++)
{
++unique[i];//2.0%
total[i] += now;//16.7%
i = ids[j];//16.8%
}
This confirms the theory that probably CPU branch prediction fails because the positions are random (btw they are not completely random, but sorted). Any idea if it's possible to speed up my code, please?
2nd edit: I tried the following:
const unsigned int now = main_list[id];
for(unsigned int i = ids[0], j = 0; j < num; j++)
{
total[i] += now;//2.0%
++unique[i];//16.7%
i = ids[j];//16.8%
}
The above test should make it clear what is going on.
unique[ids[i]]andtotal[ids[i]]are probably killing a lot of your performance because indirect array references limit the ability of the compiler to do software prefetching. Also, if you could iterate directly throughtotalanduniqueyou could easily use an openmp for loop with SSE intrinsics to do your increment and add/save operations 4 at a time per thread.numelements all have unique ids you still only use ~2.5% of youruniqueandtotalarray slots. If the ids are spread out randomly across the range, you'll be accessing a handful of bytes per memory page.