load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions #265

karpathy · 2024-04-27T00:55:04Z

Code to load bf16 weights directly, and also re-wire the position of tensors to put the layernorms (which are in fp32) at the end. the training loop seems to work ok, and the tests pass and the loss and optimization looks ok, but the gradients don't match. which can't be right. so there is a bug, but it's a bit too late in the day for me to debug right now, creating a PR and going to sleep, will fix tomorrow

…tensors to put the layernorms at the end. the training loop seems to work ok, and the tests pass and the loss and optimization looks ok, but the gradients don't match. which can't be right. so there is a bug, but it's a bit too late in the day for me to debug right now, creating a PR and going to sleep, will fix tomorrow

karpathy · 2024-04-27T00:57:02Z

oh and afaik this is an "inline merge" of #252 . if this merges, #252 can close.

karpathy · 2024-04-27T16:16:05Z

I marked this from DRAFT to PR because it is technically done afaik, and possibly could merge.
I'm doing a de-risk comparing fp32 and bf16 on 1 epoch of TinyStories with this branch and will post results before merging.

…ded them in the old order, so yeah...

… for some tensors and i don't exactly know why sad

karpathy · 2024-04-28T00:11:34Z

the only new functionality technically now is that .py writes bf16 file directly, and C loads it directly, if in bf16.
this means that we can load/store our models and checkpoints in half the size => 2X faster reads and smaller file sizes on disk

sadly, test_gpt2.cu I had to 3X some of the tolerances for reasons I don't understand, as this change should be a total noop, I just re-shuffled the memory around. This now makes me feel a bit uncomfortable again and like there is still some bug lurking...

the layernorms remain in their old places

…g around precisions

…ested via defines

…fp32 or bf16 or fp16. fp16 will error, though

karpathy added 5 commits April 27, 2024 15:27

i think i am making things cleaner, but i am not fixing the problem

09d935c

i think github copilot betrayed me on this index here, i cant remember

d4a642b

fix dumb bug. i'll blame github copilot but i can't remember

e067a27

tweak the tolerances until we pass lol

9d6fd30

print more in the comparison

a58b8d5

karpathy marked this pull request as ready for review April 27, 2024 16:15

karpathy added 3 commits April 27, 2024 23:17

fix a really bad bug in how i was checking the gradients, where i loa…

0062707

…ded them in the old order, so yeah...

bring back original ordering. i also had to bump the thresholds by 3X…

9a91b40

… for some tensors and i don't exactly know why sad

adjust comment

82d7907

karpathy added 3 commits April 28, 2024 16:08

allow user to make different precisions, add prints and error handlin…

4f7d8d9

…g around precisions

reshuffle the ifdefs to make bf16 the default if no PRECISION is requ…

a3f5ad9

…ested via defines

profile and test only use bf16. but the train script can be run with …

9d70d9a

…fp32 or bf16 or fp16. fp16 will error, though

karpathy changed the title ~~load bf16 directly, re-shuffle position of tensors in ParameterTensors, place all LayerNorms (fp32) at the end~~ load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions Apr 28, 2024

karpathy merged commit d95b8d8 into master Apr 28, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions #265

load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions #265

karpathy commented Apr 27, 2024

karpathy commented Apr 27, 2024

karpathy commented Apr 27, 2024

karpathy commented Apr 28, 2024

load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions #265

load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions #265

Conversation

karpathy commented Apr 27, 2024

karpathy commented Apr 27, 2024

karpathy commented Apr 27, 2024

karpathy commented Apr 28, 2024