Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions #265

Merged
merged 12 commits into from
Apr 28, 2024

Conversation

karpathy
Copy link
Owner

Code to load bf16 weights directly, and also re-wire the position of tensors to put the layernorms (which are in fp32) at the end. the training loop seems to work ok, and the tests pass and the loss and optimization looks ok, but the gradients don't match. which can't be right. so there is a bug, but it's a bit too late in the day for me to debug right now, creating a PR and going to sleep, will fix tomorrow

…tensors to put the layernorms at the end. the training loop seems to work ok, and the tests pass and the loss and optimization looks ok, but the gradients don't match. which can't be right. so there is a bug, but it's a bit too late in the day for me to debug right now, creating a PR and going to sleep, will fix tomorrow
@karpathy
Copy link
Owner Author

oh and afaik this is an "inline merge" of #252 . if this merges, #252 can close.

@karpathy karpathy marked this pull request as ready for review April 27, 2024 16:15
@karpathy
Copy link
Owner Author

I marked this from DRAFT to PR because it is technically done afaik, and possibly could merge.
I'm doing a de-risk comparing fp32 and bf16 on 1 epoch of TinyStories with this branch and will post results before merging.

@karpathy
Copy link
Owner Author

the only new functionality technically now is that .py writes bf16 file directly, and C loads it directly, if in bf16.
this means that we can load/store our models and checkpoints in half the size => 2X faster reads and smaller file sizes on disk

sadly, test_gpt2.cu I had to 3X some of the tolerances for reasons I don't understand, as this change should be a total noop, I just re-shuffled the memory around. This now makes me feel a bit uncomfortable again and like there is still some bug lurking...

the layernorms remain in their old places

@karpathy karpathy changed the title load bf16 directly, re-shuffle position of tensors in ParameterTensors, place all LayerNorms (fp32) at the end load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions Apr 28, 2024
@karpathy karpathy merged commit d95b8d8 into master Apr 28, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant