Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load bf16 directly, and some "quality of life" handling of fp32/fp16/bf16 precisions #265

Merged
merged 12 commits into from
Apr 28, 2024

Commits on Apr 27, 2024

  1. code to load bf16 weights directly, and also re-wire the position of …

    …tensors to put the layernorms at the end. the training loop seems to work ok, and the tests pass and the loss and optimization looks ok, but the gradients don't match. which can't be right. so there is a bug, but it's a bit too late in the day for me to debug right now, creating a PR and going to sleep, will fix tomorrow
    karpathy committed Apr 27, 2024
    Configuration menu
    Copy the full SHA
    09cd67e View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    09d935c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d4a642b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    e067a27 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    9d6fd30 View commit details
    Browse the repository at this point in the history
  6. print more in the comparison

    karpathy committed Apr 27, 2024
    Configuration menu
    Copy the full SHA
    a58b8d5 View commit details
    Browse the repository at this point in the history
  7. fix a really bad bug in how i was checking the gradients, where i loa…

    …ded them in the old order, so yeah...
    karpathy committed Apr 27, 2024
    Configuration menu
    Copy the full SHA
    0062707 View commit details
    Browse the repository at this point in the history

Commits on Apr 28, 2024

  1. bring back original ordering. i also had to bump the thresholds by 3X…

    … for some tensors and i don't exactly know why sad
    karpathy committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    9a91b40 View commit details
    Browse the repository at this point in the history
  2. adjust comment

    karpathy committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    82d7907 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    4f7d8d9 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    a3f5ad9 View commit details
    Browse the repository at this point in the history
  5. profile and test only use bf16. but the train script can be run with …

    …fp32 or bf16 or fp16. fp16 will error, though
    karpathy committed Apr 28, 2024
    Configuration menu
    Copy the full SHA
    9d70d9a View commit details
    Browse the repository at this point in the history