Some notations for the question: $w_{ij}^l$ is the weight connecting ith neuron of the layer l to the jth neuron of the layer $l-1$. $z_i^l$ is the activation of ith neuron in the layer l (for simplicity, assume linear activation function). e is the error function.
My professor of deep learning has written the following expression in gradient calculation: \begin{equation} \nabla_{w_{11}^{2}}e = \frac{\partial e}{\partial z_1^2} \frac{\partial z_1^2}{\partial w_{11}^2} \end{equation}.
However, according to the multivariate chain rule: $\frac{\partial z}{\partial x_i} = \sum_j \frac{\partial z}{\partial y_j}\frac{\partial y_j}{\partial x_i}$
For example, in calculation of $\frac{\partial{e}}{\partial{w_{11}^2}}$, we'll have it equal to $\sum_j \frac{\partial{e}}{\partial{z_j}} \frac{\partial{z_j}}{\partial{w_{11}^2}}$. For most $z_j$s, the second derivative in the summation is zero, but for the third layer, $z_1^3$ we'll actually be affected by the change in the weight $w_{11}^2$, and hence will have non-zero partial derivative. Am I missing something?
