3

The Grad sub object becomes "None" if expand the expression. Not sure why? Can somebody give some clue. If expand the w.grand.zero_() throw error as "AttributeError: 'NoneType' object has no attribute 'zero_'"

Thanks, Ganesh

import torch

x = torch.randint(size = (1,2), high = 10)
w = torch.Tensor([16,-14])
b = 36

y = w * x + b

epoch = 20
learning_rate = 0.01

w1 = torch.rand(size= (1,2), requires_grad= True)
b1 = torch.ones(size = [1], requires_grad= True)

for i in range(epoch):
    y1 = w1 * x + b1

    loss = torch.sum((y1-y)**2)

    loss.backward()

    with torch.no_grad():
        #w1 = w1 - learning_rate * w1.grad  //Not Working : w1.grad becomes "None" not sure how ;(
        #b1 = b1 - learning_rate * b1.grad  

        w1 -= (learning_rate * w1.grad)  // Working code.
        b1 -= (learning_rate * b1.grad)

        w1.grad.zero_()
        b1.grad.zero_()

    print("B ", b1)  
    print("W ", w1)
2
  • You're asking the wrong question. You should ask yourself why you have a NoneType (i.e. value None) instead of what your code expected. Also, ask yourself why your code expects that. Commented Feb 11, 2020 at 14:51
  • @UlrichEckhardt, I just pasted whole code so that anyone can copy and paste to execute for see more insights. Commented Feb 12, 2020 at 5:58

2 Answers 2

4

The thing is that in your working code you are modifying existing variable which has grad attribute, while in the non-working case you are creating a new variable.

As new w1/b1 variable is created it has no gradient attribute as you didn't call backward() on it, but on the "original" variable.

First, let's check whether that's really the case:

print(id(w1)) # Some id returned here
w1 = w1 - learning_rate * w1.grad

# In case below w1 address doesn't change
# w1 -= learning_rate * w1.grad 

print(id(w1)) # Another id here

Now, you could copy it in-place and not brake it, but there is no point to do so and your working case is much clearer, but for posterity's sake:

w1.copy_(w1 - learning_rate * w1.grad)
Sign up to request clarification or add additional context in comments.

Comments

0

The code you provided is updating the parameters w and b using gradient descent. In the first line, w.grad is the gradient of the loss function with respect to the parameter w and lr is the learning rate, a scalar value that determines the step size in the direction of the gradient.

The second line, b = b - b.grad * lr is updating the parameter b in the same way, by subtracting the gradient of the loss with respect to b multiplied by the learning rate.

However, the second line is incorrect, it should be b -= b.grad * lr instead of b = b - b.grad * lr

Using b = b - b.grad * lr would cause the parameter b to be re-assigned to the new value, but the original b would not be updated, therefore, the value of b would be None.

On the other hand, using b -= b.grad * lr will update the value of b in place, so the original b will be updated and its value will not be None.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.