I am new to OpenMP and I want to divide for-loop iterations into equal chunks. I have achievied this so far:
#pragma omp parallel for schedule(static, 2) reduction(+:tot_ext)
for (int i = 0;i<num_pos;i++) {
if (fscanf(stdin,"%lu,%lu\n", &from, &to) != 2) {
fprintf(stderr, "Cannot read correctly intervals file\n");
exit(1);
}
time = getTime();
text = (uchar*)malloc(to-from+2);
readlen = sdsl::extract(csa, from, to, text);
tot_time += (getTime() - time);
tot_ext += readlen;
if (Verbose) {
fwrite(&from,sizeof(ulong),1,stdout);
fwrite(&readlen,sizeof(ulong),1,stdout);
fwrite(text,sizeof(uchar),readlen, stdout);
}
free(text);
}
Time it takes to run this query on one core: 2.72 sec.
Time it takes to run this query on two cores: 2.64 sec.
My question is: Why the difference is so small?
Verbosetrue? How large isnum_pos? Those are generally questions one would ask. But in your case, we have several issues, you use read from a file usingfscanf(even though you're using C++). It could be that you spend most of your time reading from disk. But perhaps most importantly, how did you measure this time? If you usedtot_timethen you should note that BOTH threads increment the value (not safely, but still), so the time stored there should be roughly double the actual time.