Skip to content

usage of zero_grad() #46

Discussion options

You must be logged in to vote

Good point! So, zero_grad resets the gradients from the previous iteration. There are sometimes cases where you don't want to reset. E.g., one such case is gradient accumulation, which is useful when your computer can't handle large batch sizes. I have a write-up about it here: https://sebastianraschka.com/blog/2023/llm-grad-accumulation.html

PS: Please feel free to ask follow-up questions. Happy to chat about this more!

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by rasbt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants