Skip to content

NM Transformers v0.10

Compare
Choose a tag to compare
@markurtz markurtz released this 24 Jan 19:37
c7b33f0
Fix incorrect steps calculation when gradient acc. (#31)

When gradient accumulation is used, the effective batch size is `gradent_accumulation_steps` times larger.