3

In Neural Nets for the regression problem, we rescale the continuous labels consistently with the output activation function, i.e. normalize them if the logistic sigmoid is used, or adjusted normalize them if tanh is used. At the end we can restore original range but renormalizing the output neurons back.

Should we also normalize input features? And how? For example, if hidden activation differs from the output activation? E.g. if hidden activation is TANH and output activation is LOGISTIC, should the input features be normalized to lie in [0,1] or [-1,1] interval?

1 Answer 1

5

The short answer is yes, you should also scale the input values, although reasons behind it are quite different then those for output neurons. Activation function simply makes some output values unreachable (sigmoid can output only values in [0,1], tanh in [-1,1]), while this is not true for the input (all activation functions are defined on the whole R domain). Scaling input is performed in order to speed up convergence (so you don't get to the "flat" part of the activation function), but there are no exact rules. At least three possibilities are widely used:

  • linear scaling to [0,1]
  • linear scaling to [-1,1]
  • normalization to the mean=0 and std=1

Each having its own pros and cons for some specific datasets. As far as I know, the last one has the best statistical properties, but it is still a "rule of the thumb" in context of neural networks.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. I've also noticed that without rescaling the inputs, there are problems with overflow - exponents become huge. However, scaling solves this this problem.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.