4

The following code seems to always generate wrong result. I have tested it on gcc and windows visual studio. Is it because of float overflow or something else? Thanks in advance:)

#include <stdio.h>
#define N 51200000
int main()
{
 float f = 0.0f;
 for(int i = 0; i < N; i++)
  f += 1.0f;
 fprintf(stdout, "%f\n", f);
 return 0;
}
3
  • 1
    Please clarify what you mean by "wrong result" : What is your expected result and what is your actual result? Commented Jul 13, 2010 at 9:41
  • @AakashM: the expected result is 51200000; the actual result would be a lot less. I haven't run the code, but I would guess the actual result is something like 16777216. Commented Jul 13, 2010 at 9:46
  • as Mike stated, expected result is 51200000 and actual result is 16777216. Sorry about this unclarity. Commented Jul 13, 2010 at 10:04

6 Answers 6

14

float only has 23 bits of precision. 512000000 requires 26. Simply put, you do not have the precision required for a correct answer.

Sign up to request clarification or add additional context in comments.

5 Comments

Actually, you do not need 26 bits for 512000000. It's 15625 * 2^15. Therefore you need 14 bits of precision and 4 bits of exponent. The actual problem is that 511999999 needs 26 bits of precision.
Getting to 512000000 then. Either way it's a problem.
more interesting, the two print functions output the same result: #define N 51200000 f = float(N); fprintf(stdout, "%.7e\n", f); for(int i = 0; i < N; i++) { f -= 1.0f; } fprintf(stdout, "%.7e\n", f);
Actually a float has 24 bits of precision (23 stored + 1 hidden).
@MSalters Floating points are always normalized (with the exception of denormals). 5.12e+8 written in binary is 1.1110 1000 0100 1 e+11100, which requires 6 bits for exponent (5 + 1 for sign).
2

For more information on precision of data types in C please refer this.

Your code is expected to give abnormal behaviour when you exceed the defined precision.

Comments

2

Unreliable things to do with floating point arithmetic include adding two numbers together when they are very different in magnitude, and subtracting them when they are similar in magnitude. The first is what you are doing here; 1 << 51200000. The CPU normalises one of the numbers so they both have the same exponent; that will shift the actual value (1) off the end of the available precision when the other operand is large, so by the time you are part way through the calculation, one has become (approximately) equal to zero.

Comments

2

Your problem is the unit of least precision. Short: Big float values cannot be incremented with small values as they will be rounded to the next valid float. While 1.0 is enough to increment small values the minimal increment for 16777216 seems to be 2.0 (checked for java Math.ulp, but should work for c++ too).

Boost has some functions for this.

Comments

0

The precision of float is only 7 digits. Adding number 1 to a float larger than 2^24 gives a wrong result. With using double types instead of float you will get a correct result.

Comments

0

Whilst editing the code in your question, I came across an unblocked for loop:

#include <stdio.h>
#define N 51200000
int main() {
    float f = 0.0f;

    for(int i = 0; i < N; i++) {
        f += 1.0f;
        fprintf(stdout, "%f\n", f);
    }

    return 0;
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.