Difference between Tensorflow/Keras Dense Layer output and matmul operation with weights with NumPy

Question

I was trying to mimic the result of a simple Tensorflow/Keras Dense layer with NumPy (forward pass only) and I was surprised not to have the exact same result.

A dense layer output is just the product between the input vector and the weights (forget the bias here), so I just tried to get the weights from my Dense layer and use them with NumPy. However I get slightly different results than with processing the input vector directy with Tensorflow/Keras.

Here is a minimal reproductible example:

import numpy as np
import keras
from keras import layers

print("Keras version:", keras.__version__)
print("Backend", keras.backend.backend())

# Keras model
ins = layers.Input((2,), name='input')
out = layers.Dense(5, kernel_initializer='random_normal', use_bias=False, name='output')(ins)
shallow_model = keras.Model(inputs=ins, outputs=out)

# Input
x = np.random.random(size=(5, 2)).astype(np.float32)

# Keras output
out_keras = shallow_model.predict(x)

# Get weights of Dense Layer
[kernel] = shallow_model.layers[1].get_weights()

# Try in in Numpy
out_numpy = np.matmul(x, kernel)

# Compare results
print("Keras result:\n", out_keras)
print("Numpy result:\n", out_numpy)
print("Same result:", np.allclose(out_keras, out_numpy))

An example of output:

Keras version: 3.3.3
Backend tensorflow
Keras result:
 [[-0.13240188  0.00676447 -0.11455889  0.00669269  0.00392148]
 [-0.04194738 -0.01847801 -0.06489066 -0.03474987 -0.0088181 ]
 [-0.12778029 -0.00793487 -0.13061695 -0.01940094 -0.00327162]
 [-0.07080866  0.0196894  -0.03897876  0.0323153   0.00993819]
 [-0.03812894 -0.01733573 -0.05973222 -0.0325517  -0.00827873]]
Numpy result:
 [[-0.13241094  0.00675571 -0.11458878  0.00667856  0.00392317]
 [-0.04195426 -0.01849106 -0.06492892 -0.03477099 -0.00881922]
 [-0.12777513 -0.00794449 -0.13064197 -0.01941477 -0.00326829]
 [-0.07078259  0.01968134 -0.03896207  0.03230151  0.00993471]
 [-0.0381208  -0.01734212 -0.05974622 -0.03256048 -0.00827707]]
Same result: False

Now I get that the results are close but I was wondering were the difference come from. Any ideas?

Edit: I believe it may have something to do with floating point precision and operation orders as explained here: Numerical errors in Keras vs Numpy . I'd like to have a little more detailed answer if possible, in particular given that the only operation here is matrix multiplication.

This question is similar to: Numerical errors in Keras vs Numpy. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. — Kenzo Staelens
– Kenzo Staelens, Commented Jul 18 at 10:04
I believe it is a little different as the question you cite is about Conv2D layers and the best answer IMO is "operations are not commutable". I believe that there is only 1 operation here that is matrix multiplication. I'll edit my post to add this. — el_grezeq
– el_grezeq, Commented Jul 18 at 10:15
... matrix multiplication is still multiple operations under the hood though? @el_grezeq — Kenzo Staelens
– Kenzo Staelens, Commented Jul 18 at 10:17
Here we are "only" multiplying input of size (1, 2) with a kernel of size (2, 5) (5 times as batch size is 5) so only a few operations for each value (2 multiplications and 1 addition if I'm not wrong). That is why I'm surprised to have a difference of about 1e-5 magnitude for each value. — el_grezeq
– el_grezeq, Commented Jul 18 at 11:37

el_grezeq · Accepted Answer · 2025-07-18 12:47:23Z

2

I was finally able to understand where the difference is coming from. I was using GPU for Tensorflow/Keras so the computations are indeed different from Numpy, which runs on CPU.

Using this to have Tensorflow/Keras running on CPU got me the same result as in Numpy:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'

answered Jul 18 at 12:47

el_grezeq

1872 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

furas Jul 18 at 13:29

you could mark your answer as accepted

Collectives™ on Stack Overflow

Difference between Tensorflow/Keras Dense Layer output and matmul operation with weights with NumPy

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related