The following code
w = np.array([[2., 2.],[2., 2.]])
x = np.array([[3., 3.],[3., 3.]])
b = np.array([[4., 4.],[4., 4.]])
w = torch.tensor(w, requires_grad=True)
x = torch.tensor(x, requires_grad=True)
b = torch.tensor(b, requires_grad=True)
y = w*x + b
print(y)
# tensor([[10., 10.],
# [10., 10.]], dtype=torch.float64, grad_fn=<AddBackward0>)
y.backward(torch.FloatTensor([[1, 1],[ 1, 1]]))
print(w.grad)
# tensor([[3., 3.],
# [3., 3.]], dtype=torch.float64)
print(x.grad)
# tensor([[2., 2.],
# [2., 2.]], dtype=torch.float64)
print(b.grad)
# tensor([[1., 1.],
# [1., 1.]], dtype=torch.float64)
As the tensor argument inside gradient function is an all ones tensor in the shape of the input tensor, my understanding says that
w.gradmeans derivative ofyw.r.tw, and producesb,x.gradmeans derivative ofyw.r.tx, and producesbandb.gradmeans derivative ofyw.r.tb, and produces all ones.
Out of these, only point 3 answer is matching my expected result. Can someone help me in understanding the first two answers. I think I understand the accumulation part, but don't think that is happening here.




