I've been reading some things on neural networks and I understand the general principle of a single layer neural network. I understand the need for aditional layers, but why are nonlinear activation functions used?

This question is followed by this one: What is a derivative of the activation function used for in backpropagation?

## Best Solution

The purpose of the activation function is to introduce

non-linearity into the networkin turn, this allows you to model a response variable (aka target variable, class label, or score) that varies non-linearly with its explanatory variables

non-linearmeans that the output cannot be reproduced from a linear combination of the inputs (which is not the same as output that renders to a straight line--the word for this isaffine).another way to think of it: without a

non-linearactivation function in the network, a NN, no matter how many layers it had, would behave just like a single-layer perceptron, because summing these layers would give you just another linear function (see definition just above).A common activation function used in backprop (

) evaluated from -2 to 2:hyperbolic tangent