Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding CI check for exceeding loss tolerance #684

Merged
merged 1 commit into from
Jul 15, 2024

Conversation

rosslwheeler
Copy link
Contributor

@rosslwheeler rosslwheeler commented Jul 13, 2024

  • Modified the train_gpt32 (FP32) run with gpt2_124M.bin to add requested parameter changes in CI
  • Added a check for the loss to see if it varies > 5 percent. This is configurable - so we can change this to a lower value if that's more appropriate.

Tested in CI

@karpathy
Copy link
Owner

Sorry I don't understand the history/context for this change, is it following up on some conversation? Why are the args being changed around?

@rosslwheeler
Copy link
Contributor Author

rosslwheeler commented Jul 13, 2024

Yes, this was a suggestion in our discord conversation. Just replied to it there for your reference. Let me know if this is still of interest.

This is the output of the test in CI - it fails if it's isn't within the percent allowed. The Fixed Value on the left is out of test_gpt2.cu

Fixed Value: 5.270009, Read Value: 5.270006, Percent Difference: -0.00%
Fixed Value: 4.060681, Read Value: 4.060386, Percent Difference: -0.01%
Fixed Value: 3.320085, Read Value: 3.321317, Percent Difference: 0.04%
Fixed Value: 2.71755, Read Value: 2.718042, Percent Difference: 0.02%
Fixed Value: 2.181066, Read Value: 2.182476, Percent Difference: 0.06%
Fixed Value: 1.653923, Read Value: 1.654485, Percent Difference: 0.03%
Fixed Value: 1.16805, Read Value: 1.167975, Percent Difference: -0.01%
Fixed Value: 0.736873, Read Value: 0.736542, Percent Difference: -0.04%
Fixed Value: 0.401021, Read Value: 0.40138, Percent Difference: 0.09%
Fixed Value: 0.187493, Read Value: 0.188075, Percent Difference: 0.31%
Success: All values are within the allowed accuracy.

@karpathy karpathy merged commit f45c219 into karpathy:master Jul 15, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants