2

Intro

I randomly create training data X in a shape of (1000,10). for the label Y, it always equal the first element of the X feature.

eg. suppose x1 = [0.1,0.2,0.3...,0.9], theny = 0.1. The dataset created using the following code:

from numpy.random import RandomState
rdm=RandomState(1)
data_size=10000
xdim=10
X=rdm.rand(data_size,xdim)
Y = [x1[0] for x1 in X]

I tried to create an one layer with only one node neural network to learn this mapping, and I thought the expecting Weights should be [1,0,0,0,0,0,0,0,0,0] and biases should be 0 for extracting only the first element of x purpose.

Tensorflow

Here is the code I implemented in tensorflow. the training is not convergence.

import tensorflow as tf
x=tf.placeholder(tf.float64,shape=(None,xdim))
y=tf.placeholder(tf.float64,shape=(None))

# for simple reason, using zero to initialize both weights and biases
Weights = tf.Variable(tf.zeros([xdim, 1],dtype=tf.float64))
biases = tf.Variable(tf.zeros([1],dtype=tf.float64))
y_predict = tf.matmul(x, Weights)+biases
loss = tf.losses.mean_squared_error(y_predict,y)  
optimizer = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

batch_size=100
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(10001):
        start = i * batch_size % data_size
        end = min(start + batch_size,data_size)
        sess.run(optimizer,feed_dict={x:X[start:end],y:Y[start:end]})
        if i % 1000 == 0:
            ypred,training_loss= sess.run([y_predict,loss],feed_dict={x:X,y:Y})
            print("Epoch %d: loss=%g"%(i,training_loss))
    print('Weights:\n',sess.run(Weights))
    print('biases:\n',sess.run(biases))

The outputs are:

Epoch 0: loss=0.299163
Epoch 1000: loss=0.0838915
Epoch 2000: loss=0.0829176
Epoch 3000: loss=0.0825273
Epoch 4000: loss=0.08237
Epoch 5000: loss=0.0823084
Epoch 6000: loss=0.0822847
Epoch 7000: loss=0.0822745
Epoch 8000: loss=0.0822701
Epoch 9000: loss=0.082268
Epoch 10000: loss=0.0822669
Weights:
 [[ 0.01159591]
 [ 0.0003244 ]
 [ 0.00319655]
 [ 0.00113588]
 [-0.00079908]
 [-0.00086694]
 [ 0.00020551]
 [-0.00243378]
 [-0.00260724]
 [ 0.00052958]]
biases:
 [ 0.48771921]

Keras

import keras
from keras.models import Sequential
from keras.layers import Dense,Input
import numpy as np

model = Sequential()
model.add(Dense(units=1,input_dim=xdim,kernel_initializer='zeros',bias_initializer='zeros')) 
model.compile(loss='mse', optimizer=keras.optimizers.SGD(lr=0.01))


batch_size=100
for i in range(10001):
    start = i * batch_size % data_size
    end = min(start + batch_size,data_size)
    cost = model.train_on_batch(X[start:end], np.array(Y[start:end]))
    if i % 1000 == 0:
        print("Epoch %d: loss=%g"%(i,cost))
print('Weights:\n',model.get_weights()[0])
print('biases:\n',model.get_weights()[1])

The outputs:

Using TensorFlow backend.
Epoch 0: loss=0.284947
Epoch 1000: loss=0.00321839
Epoch 2000: loss=0.000247763
Epoch 3000: loss=5.40826e-05
Epoch 4000: loss=1.90453e-05
Epoch 5000: loss=7.40253e-06
Epoch 6000: loss=2.93623e-06
Epoch 7000: loss=1.17069e-06
Epoch 8000: loss=4.67434e-07
Epoch 9000: loss=1.86726e-07
Epoch 10000: loss=7.45764e-08
Weights:
 [[  9.99678493e-01]
 [ -3.00021959e-04]
 [ -2.89586897e-04]
 [ -2.90223019e-04]
 [ -2.83820234e-04]
 [ -2.82248948e-04]
 [ -2.96013983e-04]
 [ -3.13797180e-04]
 [ -3.20409046e-04]
 [ -3.11669020e-04]]
biases:
 [ 0.00153964]

Question

It seems like Keras can get the correct result. But I used the same process including initialisation of weights and biases, loss function and optimiser with the same learning rate. I could not understand why this happen and is there any problem/errors in my codes?

1 Answer 1

2

You should swap the arguments of tf.losses.mean_squared_error in the TensorFlow implementation:

loss = tf.losses.mean_squared_error(y, y_predict) 

In addition, the shapes of y and y_predict are (batch_size,) and (batch_size, 1), respectively. You should squeeze y_predict prior to specifying the loss function, in order to avoid unwanted implicit broadcasting:

y_predict = tf.matmul(x, Weights)+biases
y_predict = tf.squeeze(y_predict)
loss = tf.losses.mean_squared_error(y,y_predict)
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks it works. The squeeze operation is the key!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.