usage of zero_grad() #46

jihobak · 2023-05-11T07:49:04Z

jihobak
May 11, 2023

In the lecture 3.5, the instructor said we can use zero_grad() method for advanced case and It will be covered.

Could I ask simple introduction about using zero_grad?

May 11, 2023

Good point! So, zero_grad resets the gradients from the previous iteration. There are sometimes cases where you don't want to reset. E.g., one such case is gradient accumulation, which is useful when your computer can't handle large batch sizes. I have a write-up about it here: https://sebastianraschka.com/blog/2023/llm-grad-accumulation.html

PS: Please feel free to ask follow-up questions. Happy to chat about this more!

View full answer

rasbt · 2023-05-11T18:23:45Z

rasbt
May 11, 2023

Good point! So, zero_grad resets the gradients from the previous iteration. There are sometimes cases where you don't want to reset. E.g., one such case is gradient accumulation, which is useful when your computer can't handle large batch sizes. I have a write-up about it here: https://sebastianraschka.com/blog/2023/llm-grad-accumulation.html

PS: Please feel free to ask follow-up questions. Happy to chat about this more!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

usage of zero_grad() #46

{{title}}

Replies: 1 comment

{{title}}

Select a reply

usage of zero_grad() #46

jihobak May 11, 2023

Replies: 1 comment

rasbt May 11, 2023

jihobak
May 11, 2023

rasbt
May 11, 2023