0

Given a real number X within [0,1], after a specific binning I have to identify in what bin X falls. Given the bin size dx, I am using i = std::size_t(X/dx) , which works very well. I then look for the respective value of a given array v and set a second variable Y using double Y=v[i]. The whole code looks as follows:

double X = func();
dx=0.01;
int i = std::size_t(X/dx);
double Y = v[i];
print(Y)

This method correctly gives the expected value for the index i within the range [0, length(v)].

My main issue is not with finding the index, but using it: X is determined from an auxiliary function, and whenever I need to set Y=v[i] using the index determined above the code becomes extremely slow. Without commenting or removing any of the lines, the code becomes much faster when setting X to some random value between 0 and 1 right after its definition or by setting i to some random value between 0 and length of v after the third line.

Could anyone be able to tell why this occurs? The speed changes of a factor 1000 if not more, and since there are only additional steps in the faster method and func() is called anyway I can't understand why it should become faster.

7
  • 4
    please read about minimal reproducible example and try to provide one. Without the code it is impossible to tell why it is faster/slower Commented May 30, 2022 at 14:32
  • 3
    Measuring code speed in C++ is difficult and there's tons of pitfalls. Please show us how you measured. Commented May 30, 2022 at 14:36
  • With all the values known beforehand, it is possible to do division and converting to integer at compile time. Y = v[42]; would be faster than also computing i. Commented May 30, 2022 at 14:37
  • Make sure you compile with optimizations enabled. Commented May 30, 2022 at 14:38
  • What is the size of bins (v) array? Commented May 30, 2022 at 14:50

1 Answer 1

1

Since you have put no code in the question, there has to be a wild-guess like this:

  • You didn't sort all the X results before accessing lookup table. Processing a sorted array is faster.

  • Some of X had denormalized values which took a toll on computation time for certain CPU types including yours.

  • The dataset is too big for the L3 cache and it accessed RAM always, instead of quick cache hits that were seen in the other test.

  • Compiler was optimizing all of the expensive function calls out, but in real-world test scenario, it is not.

  • Time measurement has bugs

  • Computer is not stable in performance (like being a shared server or an antivirus intervention feeding on RAM bandwidth)

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.