DS Chat Step 3 - Add separate Lora Adam optimizer group #633

lekurile · 2023-07-10T19:59:10Z

This PR adds a separate Lora Adam optimizer group (for lora_right_weight and lora_left_weight params) with a Lora-specific learning rate of lr=5e-4. After this change, Step 3 training convergence with Lora enabled improved across various configurations when using zero stage 2.

Thanks to @yaozhewei for the insight!

BEFORE:

AFTER:

@yaozhewei

This PR adds a separate Lora Adam optimizer group (for lora_right_weight and lora_left_weight params) with a Lora-specific learning rate of lr=5e-4. After this change, Step 3 training convergence with Lora enabled improved across various configurations when using zero stage 2. Thanks to @yaozhewei for the insight!

Add separate lora adam optimizer group

e48fecf

lekurile requested review from jeffra, samyam, tjruwase, ShadenSmith, conglongli, awan-10, cli99, eltonzheng, minjiaz, RezaYazdaniAminabadi, duli2012, mrwyattii, yaozhewei, arashb and xiaoxiawu-microsoft as code owners July 10, 2023 19:59

Foramtting

851d868

awan-10 approved these changes Jul 10, 2023

View reviewed changes

Update run_1.3b_lora.sh Actor learning rate

808493b

lekurile merged commit b093f58 into master Jul 12, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DS Chat Step 3 - Add separate Lora Adam optimizer group #633

DS Chat Step 3 - Add separate Lora Adam optimizer group #633

lekurile commented Jul 10, 2023 •

edited

Loading

DS Chat Step 3 - Add separate Lora Adam optimizer group #633

DS Chat Step 3 - Add separate Lora Adam optimizer group #633

Conversation

lekurile commented Jul 10, 2023 • edited Loading

lekurile commented Jul 10, 2023 •

edited

Loading