float overflow?

Question

The following code seems to always generate wrong result. I have tested it on gcc and windows visual studio. Is it because of float overflow or something else? Thanks in advance:)

#include <stdio.h>
#define N 51200000
int main()
{
 float f = 0.0f;
 for(int i = 0; i < N; i++)
  f += 1.0f;
 fprintf(stdout, "%f\n", f);
 return 0;
}

Please clarify what you mean by "wrong result" : What is your expected result and what is your actual result? — AakashM
– AakashM, Commented Jul 13, 2010 at 9:41
@AakashM: the expected result is 51200000; the actual result would be a lot less. I haven't run the code, but I would guess the actual result is something like 16777216. — Mike Seymour
– Mike Seymour, Commented Jul 13, 2010 at 9:46
as Mike stated, expected result is 51200000 and actual result is 16777216. Sorry about this unclarity. — GBY
– GBY, Commented Jul 13, 2010 at 10:04

Ignacio Vazquez-Abrams · Accepted Answer · 2010-07-13 09:41:56Z

14

float only has 23 bits of precision. 512000000 requires 26. Simply put, you do not have the precision required for a correct answer.

answered Jul 13, 2010 at 9:41

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

MSalters Over a year ago

Actually, you do not need 26 bits for 512000000. It's 15625 * 2^15. Therefore you need 14 bits of precision and 4 bits of exponent. The actual problem is that 511999999 needs 26 bits of precision.

Ignacio Vazquez-Abrams Over a year ago

Getting to 512000000 then. Either way it's a problem.

GBY Over a year ago

more interesting, the two print functions output the same result: #define N 51200000 f = float(N); fprintf(stdout, "%.7e\n", f); for(int i = 0; i < N; i++) { f -= 1.0f; } fprintf(stdout, "%.7e\n", f);

Ale Over a year ago

Actually a float has 24 bits of precision (23 stored + 1 hidden).

Ale Over a year ago

@MSalters Floating points are always normalized (with the exception of denormals). 5.12e+8 written in binary is 1.1110 1000 0100 1 e+11100, which requires 6 bits for exponent (5 + 1 for sign).

Praveen S · Accepted Answer · 2010-07-13 09:49:38Z

2

For more information on precision of data types in C please refer this.

Your code is expected to give abnormal behaviour when you exceed the defined precision.

answered Jul 13, 2010 at 9:49

Praveen S

10.4k2 gold badges45 silver badges69 bronze badges

Comments

Brian Hooper · Accepted Answer · 2010-07-13 10:33:03Z

2

Unreliable things to do with floating point arithmetic include adding two numbers together when they are very different in magnitude, and subtracting them when they are similar in magnitude. The first is what you are doing here; 1 << 51200000. The CPU normalises one of the numbers so they both have the same exponent; that will shift the actual value (1) off the end of the available precision when the other operand is large, so by the time you are part way through the calculation, one has become (approximately) equal to zero.

answered Jul 13, 2010 at 10:33

Brian Hooper

22.2k25 gold badges91 silver badges146 bronze badges

Comments

josefx · Accepted Answer · 2010-07-13 10:35:43Z

2

Your problem is the unit of least precision. Short: Big float values cannot be incremented with small values as they will be rounded to the next valid float. While 1.0 is enough to increment small values the minimal increment for 16777216 seems to be 2.0 (checked for java Math.ulp, but should work for c++ too).

Boost has some functions for this.

answered Jul 13, 2010 at 10:35

josefx

15.7k6 gold badges42 silver badges63 bronze badges

Comments

zoli2k · Accepted Answer · 2010-07-13 09:44:10Z

0

The precision of float is only 7 digits. Adding number 1 to a float larger than 2^24 gives a wrong result. With using double types instead of float you will get a correct result.

answered Jul 13, 2010 at 9:44

zoli2k

3,5284 gold badges28 silver badges36 bronze badges

Comments

thejartender · Accepted Answer · 2012-06-04 08:16:49Z

0

Whilst editing the code in your question, I came across an unblocked for loop:

#include <stdio.h>
#define N 51200000
int main() {
    float f = 0.0f;

    for(int i = 0; i < N; i++) {
        f += 1.0f;
        fprintf(stdout, "%f\n", f);
    }

    return 0;
}

answered Jun 4, 2012 at 8:16

thejartender

9,3756 gold badges38 silver badges51 bronze badges

Collectives™ on Stack Overflow

float overflow?

6 Answers 6

5 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

5 Comments

Comments

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related