Skip to content

Latest commit

 

History

History
82 lines (48 loc) · 3.91 KB

Dying Relu And Relu Variants.md

File metadata and controls

82 lines (48 loc) · 3.91 KB

imageThe dying ReLU problem occurs when neurons receive too many negative inputs, resulting in their outputs being zero.

  • Consider a neural network layer where the majority of inputs to neurons are negative. With the ReLU activation function, these neurons will output zero. Over time, if these neurons do not receive positive inputs that could activate them, they will remain zero, contributing nothing to the learning process.

  • If the dying relu contributes to the de-activation of more than 50% of neurons, the model becomes useless and can no longer identify patterns in the data.

image

image

  • When the output of Relu becomes 0: image

  • What are the solutions:

  • Setting value of 0.01 for bias is proven for a good start, as it is large enough to prevent Dying Relu problem. image

Variants:

Linear Category (actually referring to piecewise linear functions like ReLU variants):

  • Leaky ReLU: A variant of ReLU that allows a small, non-zero gradient for negative inputs, preventing neurons from "dying".
  • Parametric ReLU (PReLU): Similar to Leaky ReLU, but with a learnable parameter for the slope of the negative part.
  • Both Leaky ReLU and PReLU are piecewise linear functions because they are composed of linear segments but introduce non-linearity by having different behaviors for positive and negative inputs.

Non-Linear Category:

  • ELU (Exponential Linear Unit): Uses an exponential function for negative inputs, providing smoother and more robust learning.

  • SELU (Scaled Exponential Linear Unit): A variant of ELU that scales the output, helping to normalize the network.

  • ELU and SELU are inherently non-linear and provide more complex transformations than piecewise linear functions like ReLU variants. image

  • Leaky Relu: 0.01 * z means, outputs a small -ive value for values <0 image

  • Advantages: image

  • Example Usage:

image

  • Parametric Relu:

The slope for negative inputs is not fixed; it is optimized during training, allowing the model to learn the best value for this parameter.

More flexible, potentially leading to better performance, but slightly more complex due to the additional learnable parameter.

image

  • Example Usage:

image

  • ELU (Exponential Linear Unit):

  • Alpha is constant and its values ranges from 0.1 to 0.3 image

  • Disadvantage is: cost of exponential operations image

  • SELU (Scaled Exponential Linear Unit): image

  • Derivative Plot and Advantages: image

Summary:

  • ReLU: Default choice, fast and simple.
  • Leaky ReLU: When ReLU neurons die and you need a simple fix.
  • PReLU: When you need flexibility to learn the negative slope.
  • ELU: When faster and more accurate learning is needed.
  • SELU: For very deep networks where self-normalization helps.