Skip to content

Latest commit

 

History

History
13 lines (9 loc) · 726 Bytes

README.md

File metadata and controls

13 lines (9 loc) · 726 Bytes

torch-sparse-optim

This library implements "sparser" versions of PyTorch optimizers, which only apply momentum and weight decay updates to parameters where the gradients are non-zero.

It contains four optimizers:

  • SparserSGD
  • SparserAdam
  • SparserSGDW
  • SparserAdamW

The latter two follow the approaches outlined in "Decoupled Weight Decay Regularization" by Loshchilov & Hunter from ICLR 2019.

Except for SGDW, they're all straightforward ports of the existing optimizers from PyTorch, modified only to convert momentum and weight decay to sparse updates. The SGDW optimizer additionally applies a small change to where/how weight decay is applied, as outlined in the paper above.