Inconsistent initialization of RoPE embedding across component builders #2283

Ankur-singh · 2025-01-20T20:11:55Z

Context

The Llama 3.1 self-attention builder takes RoPE embeddings as an argument, allowing us to build RoPE a single time across all layers. However, the corresponding components for Llama2 and Llama3 do not do this -- they instead construct RoPE for every single layer.

Whether we should use a single global RoPE vs one RoPE per layer? We should standardize this.

Originally posted by @ebsmothers in #2282 (comment)

Ankur-singh mentioned this issue Jan 20, 2025

Update model builders #2282

Open

13 tasks

felipemello1 added best practice Things we should be doing but aren't better engineering Tasks which help improve eng productivity e.g. building tools, cleaning up code, writing docs labels Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent initialization of RoPE embedding across component builders #2283

Inconsistent initialization of RoPE embedding across component builders #2283

Ankur-singh commented Jan 20, 2025 •

edited

Loading

Inconsistent initialization of RoPE embedding across component builders #2283

Inconsistent initialization of RoPE embedding across component builders #2283

Comments

Ankur-singh commented Jan 20, 2025 • edited Loading

Context

Ankur-singh commented Jan 20, 2025 •

edited

Loading