Inconsistent initialization of RoPE embedding across component builders #2283
Labels
best practice
Things we should be doing but aren't
better engineering
Tasks which help improve eng productivity e.g. building tools, cleaning up code, writing docs
Context
The Llama 3.1 self-attention builder takes RoPE embeddings as an argument, allowing us to build RoPE a single time across all layers. However, the corresponding components for Llama2 and Llama3 do not do this -- they instead construct RoPE for every single layer.
Whether we should use a single global RoPE vs one RoPE per layer? We should standardize this.
Originally posted by @ebsmothers in #2282 (comment)
The text was updated successfully, but these errors were encountered: