0

I was trying to mimic the result of a simple Tensorflow/Keras Dense layer with NumPy (forward pass only) and I was surprised not to have the exact same result.

A dense layer output is just the product between the input vector and the weights (forget the bias here), so I just tried to get the weights from my Dense layer and use them with NumPy. However I get slightly different results than with processing the input vector directy with Tensorflow/Keras.

Here is a minimal reproductible example:

import numpy as np
import keras
from keras import layers

print("Keras version:", keras.__version__)
print("Backend", keras.backend.backend())

# Keras model
ins = layers.Input((2,), name='input')
out = layers.Dense(5, kernel_initializer='random_normal', use_bias=False, name='output')(ins)
shallow_model = keras.Model(inputs=ins, outputs=out)

# Input
x = np.random.random(size=(5, 2)).astype(np.float32)

# Keras output
out_keras = shallow_model.predict(x)

# Get weights of Dense Layer
[kernel] = shallow_model.layers[1].get_weights()

# Try in in Numpy
out_numpy = np.matmul(x, kernel)

# Compare results
print("Keras result:\n", out_keras)
print("Numpy result:\n", out_numpy)
print("Same result:", np.allclose(out_keras, out_numpy))

An example of output:

Keras version: 3.3.3
Backend tensorflow
Keras result:
 [[-0.13240188  0.00676447 -0.11455889  0.00669269  0.00392148]
 [-0.04194738 -0.01847801 -0.06489066 -0.03474987 -0.0088181 ]
 [-0.12778029 -0.00793487 -0.13061695 -0.01940094 -0.00327162]
 [-0.07080866  0.0196894  -0.03897876  0.0323153   0.00993819]
 [-0.03812894 -0.01733573 -0.05973222 -0.0325517  -0.00827873]]
Numpy result:
 [[-0.13241094  0.00675571 -0.11458878  0.00667856  0.00392317]
 [-0.04195426 -0.01849106 -0.06492892 -0.03477099 -0.00881922]
 [-0.12777513 -0.00794449 -0.13064197 -0.01941477 -0.00326829]
 [-0.07078259  0.01968134 -0.03896207  0.03230151  0.00993471]
 [-0.0381208  -0.01734212 -0.05974622 -0.03256048 -0.00827707]]
Same result: False

Now I get that the results are close but I was wondering were the difference come from. Any ideas?

Edit: I believe it may have something to do with floating point precision and operation orders as explained here: Numerical errors in Keras vs Numpy . I'd like to have a little more detailed answer if possible, in particular given that the only operation here is matrix multiplication.

4
  • This question is similar to: Numerical errors in Keras vs Numpy. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. Commented Jul 18 at 10:04
  • I believe it is a little different as the question you cite is about Conv2D layers and the best answer IMO is "operations are not commutable". I believe that there is only 1 operation here that is matrix multiplication. I'll edit my post to add this. Commented Jul 18 at 10:15
  • 2
    ... matrix multiplication is still multiple operations under the hood though? @el_grezeq Commented Jul 18 at 10:17
  • Here we are "only" multiplying input of size (1, 2) with a kernel of size (2, 5) (5 times as batch size is 5) so only a few operations for each value (2 multiplications and 1 addition if I'm not wrong). That is why I'm surprised to have a difference of about 1e-5 magnitude for each value. Commented Jul 18 at 11:37

1 Answer 1

2

I was finally able to understand where the difference is coming from. I was using GPU for Tensorflow/Keras so the computations are indeed different from Numpy, which runs on CPU.

Using this to have Tensorflow/Keras running on CPU got me the same result as in Numpy:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
Sign up to request clarification or add additional context in comments.

1 Comment

you could mark your answer as accepted

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.