Artificial Neural Networks (ANN)
A simple artificial neural network (ANN) consisting of:
- an input layer with three nodes (orange)
- one hidden layer with four nodes (blue)
- an output layer with one node (yellow)
Adjacent layers of neural networks are usually fully connected.
The input layer
The Neuron
One node of a layer is called Neuron. Each Neuron has multiple weighted input (independend variables) and a single output value.
For best performance, the input variables should be normalized to an equal scale, e.g. mean of 0 and standard deviation of 1 for all variables.
The activation function
Each neuron applies a preselected function to the sum of its weighted inputs. The chosen function depends on its output. The activation function is usually one of the following:
The threshold function is a binary function, where its output is 1 or 0 (or yes/no).
The rectifier function takes the maximum of 0 and x, so it basically set all negative values to 0 and forward positive ones. The rectifier function is the commonly used function for neurons of hidden layers and should be chosen by default. A Neuron using the rectifier function are called Rectifier Linear Unit (ReLU).
Since today ReLUs are used for hidden nodes, the sigmoid functions are usually only used in output neurons when dealing with probabilities or other values in range [0;1].
The tanh function is a more generalized function to the sigmoid function, with output range in [-1;1]. In the past it was used as activation function for hidden neurons, since a linear combination of tanh is able to approximate every possible function. Due to its lower computational costs and the discovery of the benefits of ReLUs in 2000 (no vanishing gradient), ReLUs nearly completely replaced tanh neurons in most modern neural networks.
On additional important activation function is the softmax function. It is used for multiclass classification. The function is applied over the complete output nodes in a way that the sum of the output value equal 1. In that way each output node can be interpreted as class and its output value as the probability of the input to fall into this class.
Network training
To train the neural network, all weights are randomly initialized. The first dataset input is assigned to the input variables of the network and the signal is forward-propagated (left to right) through the network. The resulting output gets compared to the expected output and an error is measured by a predefined loss function.
Typical loss functions are:
- binary_crossentropy - used for a single output node and binary target variables
- categorical_crossentropy - used for categorical ouput where each node represents a catgegory
- mean squared error - used for regressions
There are many more loss functions available. See here for a complete list of Keras predefined loss functions, or here if you are interested in their python code.
The error is then back propagated by updating the weights according to how much they are responsible for the error. The learning rate decides how much the weights get updated. This step is called back-propagation.
The process of forward-propagation, error measurement and back-propagation is repeated several times (epochs). The process can be modified by only updating the weights after a specific number of forward-propagations. This is called batch learning. A batchsize of 10 for example means that 10 inputs are forward propagated before the weights are updated by back propagation.
The batch size may influence the speed and results of the training.
Additional Reading
Additional Reading:
- Yann LeCun et al., 1998, Efficient BackProp
- By Xavier Glorot et al., 2011, Deep sparse rectifier neural networks
- CrossValidated, 2015, A list of cost functions used in neural networks, alongside applications
- Andrew Trask, 2015, A Neural Network in 13 lines of Python (Part 2 – Gradient Descent)
- Michael Nielsen, 2015, Neural Networks and Deep Learning