More Precise Floating point Data Types than double?

Question

In my project I have to compute division, multiplication, subtraction, addition on a matrix of double elements. The problem is that when the size of matrix increases the accuracy of my output is drastically getting affected. Currently I am using double for each element which I believe uses 8 bytes of memory & has accuracy of 16 digits irrespective of decimal position. Even for large size of matrix the memory occupied by all the elements is in the range of few kilobytes. So I can afford to use datatypes which require more memory. So I wanted to know which data type is more precise than double. I tried searching in some books & I could find long double. But I dont know what is its precision. And what if I want more precision than that?

Check out the GMP project. Also there are methods to minimize round off error in computations. — brian beuning
– brian beuning, Commented Mar 27, 2013 at 13:14
In case you could rely on external dependencies, Boost 1.53 has a Multiprecision library that can helps you!! — Hugo Corrá
– Hugo Corrá, Commented Mar 27, 2013 at 13:21
Using a little algebra to rearrange mathematical calculations can help to reduce rounding errors — Ed Heal
– Ed Heal, Commented Mar 27, 2013 at 13:26
Switching to a larger type merely delays the numerical collapse. To avoid it completely, crack out a numerical analysis book and read the chapter on "stability". — Raymond Chen
– Raymond Chen, Commented Mar 27, 2013 at 13:30
Numerical collapse is the phenomenon you're experiencing: Rounding errors accumulate and lead to a wrong answer. — Raymond Chen
– Raymond Chen, Commented Mar 27, 2013 at 13:35

Potatoswatter · Accepted Answer · 2013-03-27 13:28:04Z

16

According to Wikipedia, 80-bit "Intel" IEEE 754 extended-precision long double, which is 80 bits padded to 16 bytes in memory, has 64 bits mantissa, with no implicit bit, which gets you 19.26 decimal digits. This has been the almost universal standard for long double for ages, but recently things have started to change.

The newer 128-bit quad-precision format has 112 mantissa bits plus an implicit bit, which gets you 34 decimal digits. GCC implements this as the __float128 type and there is (if memory serves) a compiler option to set long double to it.

answered Mar 27, 2013 at 13:28

Potatoswatter

139k29 gold badges281 silver badges435 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Cool_Coder Over a year ago

so who would you recommend between long double & __float128, considering the tradeoff involved in speed & accuracy?

Potatoswatter Over a year ago

@Cool_Coder I don't know the characteristics of your program, but since it's easy, just try both!

Cool_Coder Over a year ago

ok I will & let you know. Just for the sake of it let me know if the following is incorrect: __float128 *nicePrecision = new __float128();

Pete Becker Over a year ago

128-bit floats aren't all that new. They're the long double type on SPARC, which has been around for ages (as in, a little more than twenty years).

Potatoswatter Over a year ago

@PeteBecker That's still a lot newer than the 8087! And the standardization only came about in 2008, unless I'm mistaken. Anyway my impression is that they're gaining traction now because the legacy 80-bit hardware is mostly gone.

|

ogni42 · Accepted Answer · 2013-03-27 15:10:22Z

10

You might want to consider the sequence of operations, i.e. do the additions in an ordered sequence starting with the smallest values first. This will increase overall accuracy of the results using the same precision in the mantissa:

1e00 + 1e-16 + ... + 1e-16 (1e16 times) = 1e00
1e-16 + ... + 1e-16 (1e16 times) + 1e00 = 2e00

The point is that adding small numbers to a large number will make them disappear. So the latter approach reduces the numerical error

answered Mar 27, 2013 at 15:10

ogni42

1,2867 silver badges12 bronze badges

Comments

Telgin · Accepted Answer · 2013-03-27 13:16:58Z

3

Floating point data types with greater precision than double are going to depend on your compiler and architecture.

In order to get more than double precision, you may need to rely on some math library that supports arbitrary precision calculations. These probably won't be fast though.

answered Mar 27, 2013 at 13:16

Telgin

1,60410 silver badges10 bronze badges

3 Comments

us2012 Over a year ago

"These probably won't be fast enough" <- Fast enough for what? What makes you say that? And what alternatives do you suggest if one does need more precision?!

Potatoswatter Over a year ago

You sort of seem to be ignoring the existence of long double. The same issues do sort of apply, but to a much lesser extent.

Telgin Over a year ago

@us2012 I just said probably won't be fast, not not fast enough. So yes, it depends a lot on what the OP is trying to do. I'd suggest a math library if I knew one, but my experience with arbitrary precision like this is limited to other languages.

fons · Accepted Answer · 2013-03-28 15:20:18Z

0

On Intel architectures the precision of long double is 80bits.

What kind of values do you want to represent? Maybe you are better off using fixed precision.

edited Mar 28, 2013 at 15:20

answered Mar 27, 2013 at 13:21

fons

5,1964 gold badges32 silver badges49 bronze badges

3 Comments

Potatoswatter Over a year ago

long float? Really? 80 bits precision and how many go into the exponent?

Pete Becker Over a year ago

Depends on the compiler; with MS, a long double has the same precision as a double.

fons Over a year ago

I meant long double, it was just a glitch.

Collectives™ on Stack Overflow

More Precise Floating point Data Types than double?

4 Answers 4

10 Comments

Comments

3 Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

10 Comments

Comments

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related