1
$\begingroup$

Why can we not parametrize and learn the non-linear activations? For example, if we look at leaky ReLu which equals to $f(y)=y$ for $y>0$ and $f(y)=\alpha y$ for $y<0$, it seems that we can differentiate the parameter $\alpha$ with respect to the loss and learn it, why is it not done?

$\endgroup$
1
  • 3
    $\begingroup$ Saying "differentiate the parameters wrt the loss" makes no sense to me. What makes sense is differentiate a function (e.g. the loss function) with respect to some parameter. $\endgroup$ Commented May 3, 2023 at 23:39

1 Answer 1

3
$\begingroup$

The ReLU is the simplest nonlinear function that has shown remarkable performance when used as activation function in NNs. Note that the derivative is binary, either zero or one, just depending on the sign of the input. This makes ReLU very fast and convenient to use.

Since its first usage, loads of other ReLU like functions have been proposed, for your specific case, pytorch has the PReLU function already implemented, which has the learnable parameter $a$ like in your function.

$\endgroup$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.