1

The following gradient descent is failing 'coz the gradients returned by tape.gradient() are none when the loop runs second time.

w = tf.Variable(tf.random.normal((3, 2)), name='w')
b = tf.Variable(tf.zeros(2, dtype=tf.float32), name='b')
x = tf.constant([[1., 2., 3.]])


for i in range(10):
  print("iter {}".format(i))
  with tf.GradientTape() as tape:
    #forward prop
    y = x @ w + b  
    loss = tf.reduce_mean(y**2)
    print("loss is \n{}".format(loss))
    print("output- y is \n{}".format(y))
    #vars getting dropped after couple of iterations
    print(tape.watched_variables()) 
  
  #get the gradients to minimize the loss
  dl_dw, dl_db = tape.gradient(loss,[w,b]) 

  #descend the gradients
  w = w.assign_sub(0.001*dl_dw)
  b = b.assign_sub(0.001*dl_db)
iter 0
loss is 
23.328645706176758
output- y is 
[[ 6.8125362  -0.49663293]]
(<tf.Variable 'w:0' shape=(3, 2) dtype=float32, numpy=
array([[-1.3461215 ,  0.43708783],
       [ 1.5931423 ,  0.31951016],
       [ 1.6574576 , -0.52424705]], dtype=float32)>, <tf.Variable 'b:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>)
iter 1
loss is 
22.634033203125
output- y is 
[[ 6.7103477  -0.48918355]]
()

TypeError                                 Traceback (most recent call last)
c:\projects\pyspace\mltest\test.ipynb Cell 7' in <cell line: 1>()
     11 dl_dw, dl_db = tape.gradient(loss,[w,b]) 
     13 #descend the gradients
---> 14 w = w.assign_sub(0.001*dl_dw)
     15 b = b.assign_sub(0.001*dl_db)

TypeError: unsupported operand type(s) for *: 'float' and 'NoneType'

I checked the documentation which explains the possibilities of the gradients becoming None, but none of them are helping.

1
  • Have an already accepted answer. Adding TensorFlow doc reference about the subject: tensorflow.org/guide/… Commented Aug 17, 2022 at 3:03

1 Answer 1

2

This is because assign_sub returns a Tensor. In the line w = w.assign_sub(0.001*dl_dw) you are thus overwriting w with a tensor with the new value. Thus, in the next step, it is not a Variable anymore and is not tracked by the gradient tape by default. This results in the gradient becoming None (tensors also do not have the assign_sub method, so that would crash as well).

Instead, simply write w.assign_sub(0.001*dl_dw) and same for b. The assign functions work in place, so no assignment is necessary.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.