usage of zero_grad() #46
-
In the lecture 3.5, the instructor said we can use Could I ask simple introduction about using |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Good point! So, PS: Please feel free to ask follow-up questions. Happy to chat about this more! |
Beta Was this translation helpful? Give feedback.
Good point! So,
zero_grad
resets the gradients from the previous iteration. There are sometimes cases where you don't want to reset. E.g., one such case is gradient accumulation, which is useful when your computer can't handle large batch sizes. I have a write-up about it here: https://sebastianraschka.com/blog/2023/llm-grad-accumulation.htmlPS: Please feel free to ask follow-up questions. Happy to chat about this more!