You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you try to use APOLLO to train, for example, Llama 3 1b, which has tied embeddings (IE the embed_tokens and lm_head layers use the same underlying parameters), it fails to load the optimizer with ValueError: some parameters appear in more than one parameter group
For now you can just manually exclude the embedding layers from the optimizer, but it would be nice if it worked properly ootb :3
The text was updated successfully, but these errors were encountered:
If you try to use APOLLO to train, for example, Llama 3 1b, which has tied embeddings (IE the
embed_tokens
andlm_head
layers use the same underlying parameters), it fails to load the optimizer withValueError: some parameters appear in more than one parameter group
For now you can just manually exclude the embedding layers from the optimizer, but it would be nice if it worked properly ootb :3
The text was updated successfully, but these errors were encountered: