Skip to content

Latest commit

 

History

History
26 lines (20 loc) · 614 Bytes

README.md

File metadata and controls

26 lines (20 loc) · 614 Bytes

Pop-Art paper by DeepMind

About

Reproducing the example from the paper [1] (tribute).

In addition, we compare Normalized SGD to Pop-Art SGD; while the former uses gradient rescaling and the latter is based on rescaling weights, the two are equivalent in case of squared loss.

pop-art

Run

To build the Docker image and run the example, use

make run

References

[1] Hasselt et al. Learning values across many orders of magnitude. NIPS 2016.