0

I am practicing neural networks by building my own in notebooks. I am trying to check my model against an equivalent model in Keras. My model seems to work the same as other simple coded neural network frameworks, such as this one: https://towardsdatascience.com/coding-neural-network-forward-propagation-and-backpropagtion-ccf8cf369f76

However, as I increase the number of epochs the weights of the Keras model slowly diverge from my own. I am attempting to train the network using simple gradient descent, with the batch size equalling the whole training set, setting the initialised weights to the same as the initialised weights in my model. (I have been doing this on the Iris data set for now, hence the batch size = 150.)

Is there something default happening in Keras here that means the model I'm describing below is functioning slightly differently to my model (or the one described in the article)? Like batch normalisation or something?

from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Input

network_shape = np.array([4, 20, 10, 1])
activations = ["relu", "relu", "sigmoid"]
model = Sequential()
model.add(Input(shape=(network_shape[0],)))
for i in range(len(activations)):
    model.add(Dense(units=network_shape[i + 1], activation=activations[i]))

model.set_weights(set_weights)

sgd = keras.optimizers.SGD(learning_rate=alpha, momentum=0.0)
model.compile(loss='binary_crossentropy', optimizer=sgd)

model.fit(X.T, y.T, batch_size=150, epochs=n_iter, verbose=0, shuffle=False)

1 Answer 1

0

If you want to train an identical model to the one from the article, you'll need identical initial weights and hyperparameters. Unless you're learning a very simple model, like y= mx + b, once your number of epochs exceeds the example model, the weights won't be identical.

Sign up to request clarification or add additional context in comments.

5 Comments

Yes exactly. I have set the weights so that initial weights as the same, number of epochs and set the batch size to be the size of the training data, with the same learning rate too. Can you think of any other hyper parameters that might be default in Keras that will be making the weights different?
Not only all that, you'll need to pass in training data in the same order, batched and shuffled identically. Alternatively you can run both models using a single training sample to check that they're equivalent. The weights being different doesn't mean it's incorrect. As long as the loss and metrics look good and it outputs what you expect, you're fine. It's hard to be wrong when using popular libraries like TensorFlow and PyTorch when they do all the math for you.
Yeah, understand that both models would not be incorrect and I am not striving for right and wrong. I am trying to understand what is going on exactly under the hood in Keras, and wondering why it differs to my interpretation of training a model in Python. I am using the same training data and setting batch size to the whole training data set and shuffle to False, so am wondering where the difference is coming from. Not the end of the world I am just interested, that's all.
If you link a gist and the code you're comparing it to, someone can better help you. It is difficult to help when your question is lacking. If the example code is done via TensorFlow and you can't recreate it, then you have a coding error. Otherwise you can use gradient checking to confirm their custom code is valid.
Turns out it was a rounding error.... Thanks for the help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.