Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor llama3 demo to the new generator API #16753

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

mtairum
Copy link
Contributor

@mtairum mtairum commented Jan 15, 2025

Ticket

#16752

What's changed

  • New Llama3 demo now uses the generator API
  • Improved prefill performance. E.g. Llama3-70B now at 182ms prefill time
  • Improved profiling in the demo
  • Removed old text demo and updated CI accordingly
  • Cleaned up the prompt input files and added missing ones.
  • New benchmark profiling for superset: now includes TTFT, and full decode perf for 4096 iteration (for plotting).
  • Add llama3 demo custom input support: you can now override any settings for easier testing.
  • Updated PERF.md with the latest numbers.

Checklist

@mtairum mtairum force-pushed the mtairum/llama3_text_demo branch from 4f4c250 to decc3fa Compare January 17, 2025 17:27
@mtairum mtairum marked this pull request as ready for review January 17, 2025 17:37
@mtairum mtairum self-assigned this Jan 17, 2025
@mtairum mtairum added the llama3 label Jan 17, 2025
@mtairum mtairum requested a review from skhorasganiTT January 17, 2025 17:37
Copy link
Contributor

@skhorasganiTT skhorasganiTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed all the files except simple_text_demo.py for now since that one is still being cleaned up

models/demos/llama3/tt/generator.py Outdated Show resolved Hide resolved
models/demos/llama3/tt/llama_model.py Outdated Show resolved Hide resolved
models/demos/llama3/tt/generator.py Outdated Show resolved Hide resolved
models/demos/llama3/tt/generator.py Outdated Show resolved Hide resolved
models/demos/llama3/tt/generator.py Outdated Show resolved Hide resolved
models/demos/llama3/tt/generator.py Outdated Show resolved Hide resolved
@mtairum mtairum force-pushed the mtairum/llama3_text_demo branch from 943dff3 to 5dfa882 Compare January 20, 2025 12:45
@mtairum mtairum force-pushed the mtairum/llama3_text_demo branch from 3847b2b to 4161293 Compare January 21, 2025 14:30
@mtairum mtairum force-pushed the mtairum/llama3_text_demo branch 2 times, most recently from e89d7e5 to 82e3d03 Compare January 24, 2025 15:32
@mtairum mtairum requested a review from a team as a code owner January 24, 2025 16:30
@tt-rkim
Copy link
Collaborator

tt-rkim commented Jan 25, 2025

Did you run nightly ttnn single card, t3k demos, tg demos?

@mtairum
Copy link
Contributor Author

mtairum commented Jan 27, 2025

@tt-rkim In the process of fixing the CI issues. Will only merge after they pass 👍

@mtairum mtairum force-pushed the mtairum/llama3_text_demo branch 2 times, most recently from ec155d5 to 2dc2e9b Compare January 28, 2025 15:06
@mtairum mtairum force-pushed the mtairum/llama3_text_demo branch 2 times, most recently from 679a4e2 to 27beb0c Compare January 29, 2025 17:11
@mtairum
Copy link
Contributor Author

mtairum commented Jan 29, 2025

This PR is now feature complete, with a bunch of new benchmark functionality and QoL updates to the demo to better measure what the customer team throws at us.

Waiting for CI to pass then it's good to go.

Copy link
Contributor

@yieldthought yieldthought left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean

models/demos/llama3/scripts/op_perf_results.py Outdated Show resolved Hide resolved
@mtairum mtairum force-pushed the mtairum/llama3_text_demo branch 2 times, most recently from 8424b8f to ac0318e Compare January 30, 2025 17:43
@mtairum
Copy link
Contributor Author

mtairum commented Jan 30, 2025

Relaunching all tests after addressing @cglagovichTT comments.

The branch is feature complete. No more changes unless to fix the CI pipelines.

Checklist

… by reducing the size of rot_mats being passed to prefill
…n until it reaches the max. Also added support for custom input parameters
…n for 1 user or 32 users and we save that data to superset
@mtairum mtairum force-pushed the mtairum/llama3_text_demo branch from 8dc64b3 to a9940a0 Compare January 31, 2025 09:50
@mtairum
Copy link
Contributor Author

mtairum commented Jan 31, 2025

All CI are passing.
But due to the higher priority of Qwen2.5 and Deepseek distilled models, the merge of this branch will wait for that to hit main to avoid delays.

Godspeed @yieldthought 🫡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants