C/C++ - How to convert from a signed 32bit integer to a float and back

Question

I need to be able to convert a C SInt32 integer to a float in the range [-1, 1] and back. I've seen discussions of this question regarding 24 bit integers:

C/C++ - Convert 24-bit signed integer to float

And I've tried something similar:

 // Convert int - float
 SInt32 integer = 1;
 Float32 factor = 1;
 Float32 f = integer / (0x7FFFFFF + 0.5);

 // Perform some processing on the float
 Process(f);

 // Scale the float
 f = f * factor;

 // Convert float - int
 integer = f * (0x7FFFFFF + 0.5);

However this doesn't work. I know it doesn't work because the work I'm doing involves audio programming and the conversion causes a hissing sound.

I'm pretty sure it is a conversion problem because when I make the float smaller by setting the factor to 0.0001 the crackling disappears. Maybe the back conversion is putting the int out of it's limits and is causing it to be truncated.

Any advice would be greatly appreciated.

user9876 · Accepted Answer · 2012-10-29 17:09:29Z

4

Read up on IEEE floating point formats. The IEEE 32-bit float only supports 24 significant bits, so if you convert a 32-bit integer you will lose the low 8 bits.

answered Oct 29, 2012 at 17:09

user9876

11.1k7 gold badges47 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

James Andrews Over a year ago

Thanks, so I need to use a 24 bit int.

Daniel Fischer Over a year ago

@BenSmiley Or convert to double, that gives you 53 bits of precision (usually).

Aki Suihkonen · Accepted Answer · 2012-10-29 17:12:01Z

2

const float recip = 1.0 / (32768.0*65536.0);
// hope that compiler will calculate this in advance
// From the expression an semi-advanced programmer can also immediately spot
// where the value comes from
float value = int_value * recip;
int value2 = value * (32768.0*65536.0);

The process is not reversible: one can lose up to 7 bits of accuracy.

answered Oct 29, 2012 at 17:12

Aki Suihkonen

20.5k1 gold badge43 silver badges68 bronze badges

6 Comments

Aki Suihkonen Over a year ago

Multiplying with these values isn't exactly what OP wanted: both +-1 included, where as integers range from [-2^n .. 2^n-1], but multiplying or dividing by (2^n -1) produces slightly more noise.

Eric Postpischil Over a year ago

The question indicates that the float may have the value 1. (“[-1, 1]” denotes a closed interval; it includes its endpoints.) The calculation for value2 will convert a 1 in value to 2,147,483,648, which overflows a signed 32-bit integer.

Aki Suihkonen Over a year ago

@EricPostpischil - yes, I noticed. Luckily float 1.0 * 2^31 as an integer does not overflow, but it saturates to MAX_INT according to the IEEE-754 standard. I think it's better design choice to clip a rare sample than to introduce quantization noise to every other sample.

Eric Postpischil Over a year ago

The IEEE 754-2008 says, in clause 7.2, conversion to an integer when the source is outside the destination range causes an invalid operation exception. The 2011 C standard says, in clause F.4, that the invalid operation exception is raised and the result is unspecified. If the C implementation does not support annex F (which many do not), clause 6.3.1.4 says the behavior is undefined.

j-a Over a year ago

-1, this will not only lose up to the 7 bits.. it can loose all bits. Assume int is 32bit, float = 1.0 * (32768.0*65536.0), will overflow, the value would become negative: INT32_MIN.

|

Collectives™ on Stack Overflow

C/C++ - How to convert from a signed 32bit integer to a float and back

2 Answers 2

2 Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related