Refactor llama3 demo to the new generator API #16753

mtairum · 2025-01-15T10:36:35Z

Ticket

#16752

What's changed

New Llama3 demo now uses the generator API
Improved prefill performance. E.g. Llama3-70B now at 182ms prefill time
Improved profiling in the demo
Removed old text demo and updated CI accordingly
Cleaned up the prompt input files and added missing ones.
New benchmark profiling for superset: now includes TTFT, and full decode perf for 4096 iteration (for plotting).
Add llama3 demo custom input support: you can now override any settings for easier testing.
Updated PERF.md with the latest numbers.

Checklist

skhorasganiTT

Reviewed all the files except simple_text_demo.py for now since that one is still being cleaned up

models/demos/llama3/tt/generator.py

models/demos/llama3/tt/llama_model.py

models/demos/llama3/tt/generator.py

tt-rkim · 2025-01-25T23:44:07Z

Did you run nightly ttnn single card, t3k demos, tg demos?

mtairum · 2025-01-27T10:32:14Z

@tt-rkim In the process of fixing the CI issues. Will only merge after they pass 👍

mtairum · 2025-01-29T17:16:54Z

This PR is now feature complete, with a bunch of new benchmark functionality and QoL updates to the demo to better measure what the customer team throws at us.

Waiting for CI to pass then it's good to go.

yieldthought

Clean

models/demos/llama3/scripts/op_perf_results.py

models/demos/llama3/tt/llama_common.py

mtairum · 2025-01-30T17:50:28Z

Relaunching all tests after addressing @cglagovichTT comments.

The branch is feature complete. No more changes unless to fix the CI pipelines.

Checklist

…ng prompts and tested all of them

… and reduce time to first token

… by reducing the size of rot_mats being passed to prefill

…n until it reaches the max. Also added support for custom input parameters

…n for 1 user or 32 users and we save that data to superset

…I tests

mtairum · 2025-01-31T12:04:00Z

All CI are passing.
But due to the higher priority of Qwen2.5 and Deepseek distilled models, the merge of this branch will wait for that to hit main to avoid delays.

Godspeed @yieldthought 🫡

mtairum force-pushed the mtairum/llama3_text_demo branch from 4f4c250 to decc3fa Compare January 17, 2025 17:27

mtairum marked this pull request as ready for review January 17, 2025 17:37

mtairum requested review from cglagovichTT, yieldthought and uaydonat as code owners January 17, 2025 17:37

mtairum self-assigned this Jan 17, 2025

mtairum added the llama3 label Jan 17, 2025

mtairum requested a review from skhorasganiTT January 17, 2025 17:37

skhorasganiTT reviewed Jan 17, 2025

View reviewed changes

mtairum force-pushed the mtairum/llama3_text_demo branch from 943dff3 to 5dfa882 Compare January 20, 2025 12:45

skhorasganiTT reviewed Jan 20, 2025

View reviewed changes

models/demos/llama3/tt/generator.py Show resolved Hide resolved

mtairum force-pushed the mtairum/llama3_text_demo branch from 3847b2b to 4161293 Compare January 21, 2025 14:30

skhorasganiTT approved these changes Jan 21, 2025

View reviewed changes

mtairum force-pushed the mtairum/llama3_text_demo branch 2 times, most recently from e89d7e5 to 82e3d03 Compare January 24, 2025 15:32

mtairum requested a review from a team as a code owner January 24, 2025 16:30

tt-rkim approved these changes Jan 25, 2025

View reviewed changes

mtairum force-pushed the mtairum/llama3_text_demo branch 2 times, most recently from ec155d5 to 2dc2e9b Compare January 28, 2025 15:06

mtairum mentioned this pull request Jan 29, 2025

Remove separate llama3 pip installs from CI pipelines #16235

Open

mtairum force-pushed the mtairum/llama3_text_demo branch 2 times, most recently from 679a4e2 to 27beb0c Compare January 29, 2025 17:11

yieldthought approved these changes Jan 29, 2025

View reviewed changes

models/demos/llama3/scripts/op_perf_results.py Outdated Show resolved Hide resolved

cglagovichTT reviewed Jan 29, 2025

View reviewed changes

models/demos/llama3/tt/llama_common.py Show resolved Hide resolved

mtairum force-pushed the mtairum/llama3_text_demo branch 2 times, most recently from 8424b8f to ac0318e Compare January 30, 2025 17:43

mtairum added 28 commits January 31, 2025 09:49

#0: Moved llama3 input prompts to a separate folder. Added missing lo…

83b6404

…ng prompts and tested all of them

#0: Fix tensor padding and moved to inside llama model

d1a6653

#0: Add TTFT info to lt

f2194c4

#0: Removed unused argmax_on_device

a5859ba

#0: Moved the dram mem config to the end

c027942

#0: Updated attention wo dense matmul program config to increase perf…

3a87554

… and reduce time to first token

#: Reduced the printed outputs to increase readability when using 1 user

fce423c

#0: Removed demo.py. Updated references to new simple_text_demo.py

a819b8b

#0: Added missing requirements to CI demo tests. Improve prefill time…

d45815a

… by reducing the size of rot_mats being passed to prefill

#0: Fix average tok/s/u to account for compile iteration

a33c7f1

#0: Add req to single chip demo script

a551fb2

#0: Add missing req to tg CI

597c3b8

#0: Updated ref in llama accuracy

249e79d

#0: Fix CI

0805d5a

#0: Fix llama test model

93cdda7

#0: Fix profiler benchmark names to actually work

a5a9efd

0: Added new stop_at_eos option in demo, to ignore stopping generatio…

3da2793

…n until it reaches the max. Also added support for custom input parameters

#0: Added new llama3 demo CI options. Now they always run 4k iteratio…

1b05a63

…n for 1 user or 32 users and we save that data to superset

#0: Moved llama pip requirements to the requirements dev. Fix llama C…

2d6116c

…I tests

#0: Fix llama demo ci-32 to fit 70B on T3K

a359f83

#0: Add new metric to the llama benchmark

e5338d0

#0: Update Llama Readme

26c5ade

#0: Update llama3 perf.md

e227f35

#0: Re add llama3 op_perf_results script

9a3a872

#0: Removed LLama 11B demo from N150, since it can't fit a batch of 32

5db7119

#0: Minor adjustment to PCC check on llama vision test

8b9b7cd

#0: Avoid generating cos/sin matrices twice

6f9ab1e

#0: Fix CI N150/N300

a9940a0

mtairum force-pushed the mtairum/llama3_text_demo branch from 8dc64b3 to a9940a0 Compare January 31, 2025 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor llama3 demo to the new generator API #16753

Refactor llama3 demo to the new generator API #16753

mtairum commented Jan 15, 2025 •

edited

Loading

skhorasganiTT left a comment

tt-rkim commented Jan 25, 2025

mtairum commented Jan 27, 2025

mtairum commented Jan 29, 2025

yieldthought left a comment

mtairum commented Jan 30, 2025 •

edited

Loading

mtairum commented Jan 31, 2025

Refactor llama3 demo to the new generator API #16753

Are you sure you want to change the base?

Refactor llama3 demo to the new generator API #16753

Conversation

mtairum commented Jan 15, 2025 • edited Loading

Ticket

What's changed

Checklist

skhorasganiTT left a comment

Choose a reason for hiding this comment

tt-rkim commented Jan 25, 2025

mtairum commented Jan 27, 2025

mtairum commented Jan 29, 2025

yieldthought left a comment

Choose a reason for hiding this comment

mtairum commented Jan 30, 2025 • edited Loading

Checklist

mtairum commented Jan 31, 2025

mtairum commented Jan 15, 2025 •

edited

Loading

mtairum commented Jan 30, 2025 •

edited

Loading