Why are the non-linear activations in deep nets not learned?

Question

Why can we not parametrize and learn the non-linear activations? For example, if we look at leaky ReLu which equals to $f(y)=y$ for $y>0$ and $f(y)=\alpha y$ for $y<0$, it seems that we can differentiate the parameter $\alpha$ with respect to the loss and learn it, why is it not done?

Saying "differentiate the parameters wrt the loss" makes no sense to me. What makes sense is differentiate a function (e.g. the loss function) with respect to some parameter. — nbro
– nbro, Commented May 3, 2023 at 23:39

N. Kiefer · Accepted Answer · 2023-05-03 19:22:51Z

3

The ReLU is the simplest nonlinear function that has shown remarkable performance when used as activation function in NNs. Note that the derivative is binary, either zero or one, just depending on the sign of the input. This makes ReLU very fast and convenient to use.

Since its first usage, loads of other ReLU like functions have been proposed, for your specific case, pytorch has the PReLU function already implemented, which has the learnable parameter $a$ like in your function.

answered May 3, 2023 at 19:22

N. Kiefer

3213 silver badges9 bronze badges

Add a comment |

Stack Exchange Network

Why are the non-linear activations in deep nets not learned?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Why are the non-linear activations in deep nets not learned?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions