2

So when running the below snipet

import numpy as np

X = np.array([-0.20000008], dtype=np.float32)
np.floor((X + 1) / 0.04)
array([20.], dtype=float32)

The output is obviously wrong as the result should be below 20 and should floor to 19

I get that this is precision errors but running all below samples produce correct results although it should have similar precision

X = np.array([-0.20000008], dtype=np.float32).item()
np.floor((X + 1) / 0.04) # 19.0
X = np.float32(-0.20000008)
np.floor((X + 1) / 0.04) # 19.0
X = np.array([-0.20000008], dtype=np.float32)
np.floor(X / 0.04 + 1 / 0.04) # array([19.], dtype=float32)
np.floor(np.multiply((X + 1), 1/0.04)) # array([19.], dtype=float32)

If I cast it as float64 it works too but it is very expensive cast for my application. Any solutions while sticking to float32?

1
  • What about using np.array([np.floor((x + 1) / 0.04) for x in X]) based on your observations? Commented Mar 3, 2022 at 6:26

1 Answer 1

3

Let's try to understand the first two of the three examples on the bottom first:

In the first example

np.array([-0.20000008], dtype=np.float32).item()

will produce a native python float() which is a 64 bit, so no surprizes here.

In the second example you have create a numpy 32-bit scalar (shape==(), type==np.float32) which will get treated more or less like other scalars: So as soon as you add an int (1), the result will be a 64 bit number.

The interesting case now is actually your initial piece of code and the third example: In both cases you now have an array (shape==(1,), type=np.ndarray). In the case of operations with an array and a scalar, the type of the array will be preserved. So now we actually just have the issue that the distributive law does not hold for floating point numbers. Here you're doing computations that rely on the least significant bits of floating point numbers.

Sign up to request clarification or add additional context in comments.

7 Comments

Interesting. But how to explain that ((X + np.float32(1)) / 0.04).dtype is float32 and ((X[0] + np.float32(1)) / 0.04).dtype is float64?
In the first example X is an np.ndarray while in the second one X[0] will be a np.float32 scalar, so it's again a similar situation to what I mentioned in the answer. Note that X[0] is not the same as X.item() here, the former still gets you a numpy object, while the latter will be a native python float!
So to answer your question, in the first case we have an array and the type of the array is preserved through the operation with scalars, in the second case we hava scalars, among which is a 64 bit float (0.04), the result will be a 64 bit number.
What about np.floor(np.multiply((X + 1), 1/0.04)) which produces 19.0 as well
@ma7555 I'd suggest dividing it up into all the intermediate results, inspecting the types and then thinking about where exactly you find a difference to what you expect!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.