Skip to content

Commit

Permalink
Add an omnigrok note
Browse files Browse the repository at this point in the history
  • Loading branch information
JasonGross committed Jan 25, 2024
1 parent 3a40b82 commit d2db3d3
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion notebooks_jason/max_of_2_grokking.py
Original file line number Diff line number Diff line change
Expand Up @@ -714,5 +714,7 @@ def compute_traces_and_frames(
# However, because the test set includes so many more sequences, and the loss-reduction is carefully tuned to the train set, we see a sharp increase in test loss.
#
# But once the monotonicity violation is resolved, there's a sharp reversal in the generalization from the train set to the test set: now every adjustment helps the test set even more than the train set, so we see a sharp drop in loss.

#
#
# Added note: [2210.01117: Omnigrok: Grokking Beyond Algorithmic Data](https://arxiv.org/abs/2210.01117) claims that "Grokking is caused by the mismatch between training and test loss landscapes." This demo shows that this explanation isn't a complete picture, though, because there's still a phase transition in *training loss* even apart from the train-test loss landscape mismatch.
# %%

0 comments on commit d2db3d3

Please sign in to comment.