Float rounding error in time-critical C++ loop, looking for an efficient solution

Question

As a premise, I am aware that this problem has been addressed already, but never in this specific scenario, from what I could find searching.

In a time-critical piece of code, I have a loop where a float value x must grow linearly from exactly 0 to-and-including exactly 1 in 'z' steps.

The un-optimized solution, but which would work without rounding errors, is:

const int z = (some number);
int c;
float x;

for(c=0; c<z; c++)
{
   x = (float)c/(float)(z-1);
   // do something with x here
}

obviously I can avoid the float conversions and use two loop variables and caching (float)(z-1):

const int z = (some number);
int c;
float xi,x;
const float fzm1 = (float)(z-1);

for(c=0,xi=0.f; c<z; c++, xi+=1.f)
{
   x=xi/fzm1;
   // do something with x
}

But who would ever repeat a division by a constant for every loop pass ? Obviously anyone would turn it into a multiplication:

const int z = (some number);
int c;
float xi,x;
const float invzm1 = 1.f/(float)(z-1);

for(c=0,xi=0.f; c<z; c++, xi+=1.f)
{
   x=xi * invzm1;
   // do something with x
}

Here is where obvious rounding issues may start to manifest. For some integer values of z, (z-1)*(1.f/(float)(z-1)) won't give exactly one but 0.999999..., so the value assumed by x in the last loop cycle won't be exactly one.

If using an adder instead, i.e

const int z = (some number);
int c;
float x;
const float x_adder = 1.f/(float)(z-1);

for(c=0,x=0.f; c<z; c++, x+=x_adder)
{
   // do something with x
}

the situation is even worse, because the error in x_adder will build up.

So the only solution I can see is using a conditional somewhere, like:

const int z = (some number);
int c;
float xi,x;
const float invzm1 = 1.f/(float)(z-1);

for(c=0,xi=0.f; c<z; c++, xi+=1.f)
{
   x = (c==z-1) ? 1.f : xi * invzm1;
   // do something with x
}

but in a time-critical loop a branch should be avoided if possible !

Oh, and I can't even split the loop and do


for(c=0,xi=0.f; c<z-1; c++, xi+=1.f) // note: loop runs now up to and including z-2
{
   x=xi * invzm1;
   // do something with x
}

x=1.f;
// do something with x

because I would have to replicate the whole block of code 'do something with x' which is not short or simple either, I cannot make of it a function call (would be inefficient, too many local variables to pass) nor I want to use #defines (would be very poor and inelegant and impractical).

Can you figure out any efficient or smart solution to this problem ?

Use the loop to get all but the final value (since you already know the final value). — Scott Hunter
– Scott Hunter, Commented Feb 6, 2023 at 16:49
Doesn't either the numerator or denominator have to be a float for the division to result in a float aswell? That would save at least one cast every calculation. — B Remmelzwaal
– B Remmelzwaal, Commented Feb 6, 2023 at 16:51
Have you actually benchmarked all the options ? I somehow doubt the cost of the branch (last proposition) would be too bad, and the compiler might actually unroll the last iteration. — Vincent Fourmond
– Vincent Fourmond, Commented Feb 6, 2023 at 17:03
@Scott Hunter if you read my question well I specified that I can't do it because that would imply repeating the code which must process 'x' — BadHellie
– BadHellie, Commented Feb 7, 2023 at 22:10
@B Remmelzwaal evidently you didn't read my question thoughly because I did exactly so in the second optimization step — BadHellie
– BadHellie, Commented Feb 7, 2023 at 22:11

Homer512 · Accepted Answer · 2023-02-07 07:34:29Z

4

First, a general consideration: xi += 1.f introduces a loop-carried dependency chain of however many cycles your CPU needs for floating point addition (probably 3 or 4). It also kills any attempt at vectorization unless you compile with -ffast-math. If you run on a modern super-scalar desktop CPU, I recommend using an integer counter and converting to float in each iteration.

In my opinion, avoiding int->float conversions is outdated advise from the era of x87 FPUs. Of course you have to consider the entire loop for the final verdict but the throughput is generally comparable to floating point addition.

For the actual problem, we may look at what others have done, for example Eigen in the implementation of their LinSpaced operation. There is also a rather extensive discussion in their bug tracker.

Their final solution is so simple that I think it is okay to paraphrase it here, simplified for your specific case:

float step = 1.f / (n - 1);
for(int i = 0; i < n; ++i)
    float x = (i + 1 == n) ? 1.f : i * step;

The compiler may choose to peel off the last iteration to get rid of the branch but in general it is not too bad anyway. In scalar code branch prediction will work well. In vectorized code it's a packed compare and a blend instruction.

We may also force the decision to peel off the last iteration by restructuring the code appropriately. Lambdas are very helpful for this since they are a) convenient to use and b) very strongly inlined.

auto loop_body = [&](int i, float x) mutable {
    ...;
};
for(int i = 0; i < n - 1; ++i)
    loop_body(i, i * step);
if(n > 0)
    loop_body(n - 1, 1.f);

Checking with Godbolt (using a simple array initialization for the loop body), GCC only vectorizes the second version. Clang vectorizes both but does a better job with the second.

edited Feb 7, 2023 at 7:34

answered Feb 6, 2023 at 18:07

Homer512

15.1k2 gold badges16 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

BadHellie Over a year ago

thanks for your interesting elaboration. It therefore looks like an 'universal' solution does not exist... My software is not targeted at one specific CPU but is mainly thought for averagely recent 64 bits Intel cpus. Sadly I cannot benchmark every single function of my program because it would be too long and complex. Re: your point of not avoiding the int->float conversion, I always tried to avoid type conversions instead, unless really unavoidable. I will surely deepen this topic, thanks for pointing out

BadHellie Over a year ago

Actually, by a quick look at the asm, it looks like GCC is actually peeling off the last iteration (using the highest optimization settings and -ffast-math)

Homer512 Over a year ago

@elena good to know that it works for you. Regarding CPUs: Basically any desktop or smartphone CPU that is not truly ancient should work fine. Only 32 bit without SSE2 may suffer. Pentium M or above are fine. Going by Agner's instruction tables

Damir Tenishev · Accepted Answer · 2023-02-06 21:53:41Z

1

What you need is Bresenham's line algorithm.

It would allow you to avoid multiplication and divisions and use add/sub only. Just scale your range so that it could be represented by integer numbers and round up at final stage if precise split to parts is mathematically (or "representatively") impossible.

answered Feb 6, 2023 at 21:53

Damir Tenishev

3,5688 silver badges24 bronze badges

Comments

Arkady · Accepted Answer · 2023-02-06 16:55:06Z

0

Consider using this:

const int z = (some number > 0);
const int step = 1000000/z;
for(int c=0; c<z-1; ++c)
{
   x += step; //just if you really need the conversion, divide it by 1000000 when required
   // do something with x
}
x = 1.f;
//do the last step with x

No conversions if you don't really need it, first and last values are as expected, multiplication is reduced to accumulation. By changing of 1000000 you can manually control the precision.

answered Feb 6, 2023 at 16:55

Arkady

2,2473 gold badges28 silver badges49 bronze badges

2 Comments

Peter Over a year ago

Problem with this is that rounding error in the calculation of step will propagate and the error in x will increase as number of iterations increases.

Arkady Over a year ago

@Peter I agree, but reasonable accuracy can be achieved by increasing of 1000000, and in general accuracy will be an admissible problem anyway

Eugene · Accepted Answer · 2023-02-06 17:37:40Z

0

I suggest that you start with the last alternative you have shown and use lambda to avoid passing local variables:

auto do_something_with_x = [&](float x){/*...*/}

for(c=0,xi=0.f; c<z-1; c++, xi+=1.f) // note: loop runs now up to and including z-2
{
   x=xi * invzm1;
   do_something_with_x(x);
}

do_something_with_x(1.f);

answered Feb 6, 2023 at 17:37

Eugene

7,7762 gold badges26 silver badges37 bronze badges

Collectives™ on Stack Overflow

Float rounding error in time-critical C++ loop, looking for an efficient solution

4 Answers 4

3 Comments

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related