Gradient calculation in Backpropogation

Question

Some notations for the question: $w_{ij}^l$ is the weight connecting ith neuron of the layer l to the jth neuron of the layer $l-1$. $z_i^l$ is the activation of ith neuron in the layer l (for simplicity, assume linear activation function). e is the error function.

My professor of deep learning has written the following expression in gradient calculation: \begin{equation} \nabla_{w_{11}^{2}}e = \frac{\partial e}{\partial z_1^2} \frac{\partial z_1^2}{\partial w_{11}^2} \end{equation}.

However, according to the multivariate chain rule: $\frac{\partial z}{\partial x_i} = \sum_j \frac{\partial z}{\partial y_j}\frac{\partial y_j}{\partial x_i}$

For example, in calculation of $\frac{\partial{e}}{\partial{w_{11}^2}}$, we'll have it equal to $\sum_j \frac{\partial{e}}{\partial{z_j}} \frac{\partial{z_j}}{\partial{w_{11}^2}}$. For most $z_j$s, the second derivative in the summation is zero, but for the third layer, $z_1^3$ we'll actually be affected by the change in the weight $w_{11}^2$, and hence will have non-zero partial derivative. Am I missing something?

cinch · Accepted Answer · 2024-10-26 04:29:58Z

0

Following your notation since the weight $w_{11}^2$ won't affect $z_2^2$, after applying multivariate chain rule it's obvious you don't need the summation term containing it. Therefore your professor's gradient formula w.r.t. $w_{11}^2$ sounds correct which contains only the $z_1^2$ summation term.

answered Oct 26, 2024 at 4:29

cinch

11.1k3 gold badges8 silver badges17 bronze badges

$\begingroup$ Yes, but $w_{11}^2$ affects $z_1^2$, and $z_1^2$ affects $z_1^3$. So, shouldn't the derivative of $z_1^3$ with respect to $w_{11}^2$ be non-zero and hence, appear in the summation? $\endgroup$

Vedant Yadav
– Vedant Yadav

2024-11-05 18:34:54 +00:00
Commented Nov 5, 2024 at 18:34
$\begingroup$ @VedantYadav your question is about the loss $e$ which implicitly involves $z^3_1$, so by your own formula above it already appears via the chain rule where the third layer's effect of $z^3_1$ from $w_{11}^2$ is rightly expressed. $\endgroup$

cinch
– cinch

2024-11-05 18:41:31 +00:00
Commented Nov 5, 2024 at 18:41

Add a comment |

Stack Exchange Network

Gradient calculation in Backpropogation

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Gradient calculation in Backpropogation

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions