Traditional max pooling operations can result in sparse gradients, which might affect the training of neural networks. The smooth max pooling implementation here uses LogSumExp to approximate the maximum operation while ensuring that gradients are distributed across all neurons.
The LogSumExp (LSE) function provides a smooth approximation to the maximum function. Given a vector
The LSE function approximates the maximum with the following bounds:
where
The LSE function is convex and strictly increasing. Its gradient corresponds to the softmax function:
This ensures smooth gradients and better propagation through the network.
This project is licensed under the MIT License - see the LICENSE file for details.