Skip to content

Latest commit

 

History

History
35 lines (32 loc) · 3 KB

training.md

File metadata and controls

35 lines (32 loc) · 3 KB

Index

dark

Why we need Batch Normalization

dark

  • In order to resolve the vanishing/exploding gradients problem if we use He initialization along with ELU (or any variant of ReLU) can significantly reduce the problems at the beginning of training, it doesn’t guarantee that they won’t come back during training.

Batch Normalization

light

  • Batch Normalization consists of adding an operation in the model just before or after the activation function of each hidden layer.
  • This operation simply
    • zerocenters and normalizes each input,
    • then scales and shifts the result using two new parameter vectors per layer: one for scaling, the other for shifting.
  • In other words, the operation lets the model learn the optimal scale and mean of each of the layer’s inputs.
  • In many cases, if you add a BN layer as the very first layer of your neural network, you do not need to standardize your training set (e.g., using a StandardScaler);
  • The BN layer will do it for you (well, approximately, since it only looks at one batch at a time, and it can also rescale and shift each input feature).
  • In order to zero-center and normalize the inputs, the algorithm estimates each input’s mean and standard deviation of the input over the current mini-batch (hence the name “Batch Normalization”).

Finally, like a gift that keeps on giving, Batch Normalization acts like a regularizer, reducing the need for other regularization techniques (such as dropout).

Gradient Clipping

light

  • Gradient Clipping is a technique to mitigate the exploding gradients problem is to clip the gradients during backpropagation so that they never exceed some threshold.
  • This technique is most often used in recurrent neural networks, as Batch Normalization is tricky to use in RNNs.
  • For other types of networks, Batch Normalization is usually sufficient.
  • In Keras, implementing Gradient Clipping is just a matter of setting the clipvalue or clipnorm argument when creating an optimizer, like this:
optimizer = keras.optimizers.SGD(clipvalue=1.0)
model.compile(loss="mse", optimizer=optimizer)