Need help understanding the gradient function in pytorch

Question

The following code


w = np.array([[2., 2.],[2., 2.]])
x = np.array([[3., 3.],[3., 3.]])
b = np.array([[4., 4.],[4., 4.]])
w = torch.tensor(w, requires_grad=True)
x = torch.tensor(x, requires_grad=True)
b = torch.tensor(b, requires_grad=True)


y = w*x + b 
print(y)
# tensor([[10., 10.],
#         [10., 10.]], dtype=torch.float64, grad_fn=<AddBackward0>)

y.backward(torch.FloatTensor([[1, 1],[ 1, 1]]))

print(w.grad)
# tensor([[3., 3.],
#         [3., 3.]], dtype=torch.float64)

print(x.grad)
# tensor([[2., 2.],
#         [2., 2.]], dtype=torch.float64)

print(b.grad)
# tensor([[1., 1.],
#         [1., 1.]], dtype=torch.float64)

As the tensor argument inside gradient function is an all ones tensor in the shape of the input tensor, my understanding says that

w.grad means derivative of y w.r.t w, and produces b,
x.grad means derivative of y w.r.t x, and produces b and
b.grad means derivative of y w.r.t b, and produces all ones.

Out of these, only point 3 answer is matching my expected result. Can someone help me in understanding the first two answers. I think I understand the accumulation part, but don't think that is happening here.

Should I post on data science stack exchange ?

user3656142
– user3656142

2020-05-30 15:03:57 +00:00
Commented May 30, 2020 at 15:03 — user3656142
– user3656142, Commented May 30, 2020 at 15:03

Michael Jungo · Accepted Answer · 2020-05-30 16:18:46Z

5

To find the correct derivatives in this example, we need to take the sum and product rule into consideration.

Sum rule:

Product rule:

That means the derivatives of your equation are calculated as follows.

With respect to x:

With respect to w:

With respect to b:

The gradients reflect exactly that:

torch.equal(w.grad, x) # => True

torch.equal(x.grad, w) # => True

torch.equal(b.grad, torch.tensor([[1, 1], [1, 1]], dtype=torch.float64)) # => True

answered May 30, 2020 at 16:18

Michael Jungo

33.2k4 gold badges97 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Need help understanding the gradient function in pytorch

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related