Llama 3.x integration using new setup.sh and LLM code base #164

tstescoTT · 2025-01-31T05:37:53Z

change log

support multiple models using same container, adds support for MODEL_ID environment variable in tt-inference-server.
update volume initialization for new file permissions strategy
add SetupTypes to handle different first run and validation behaviour
hf_model_id is used to define model_id and model_name if provided (rename hf_model_path to hf_model_id)
/home/user/cache_root changed to /home/container_app_user/cache_root
fix get_devices_mounts, add mapping
use MODEL_ID if in container env_vars to map to impl model config
set defaults for ModelImpl
add configs for llama 3.x models
remove HF_TOKEN from tt-studio .env for ease of setup
add environment file processing

* support multiple models using same container, adds support for MODEL_ID environment variable in tt-inference-server. * update volume initialization for new file permissions strategy * add SetupTypes to handle different first run and validation behaviour * hf_model_id is used to define model_id and model_name if provided (rename hf_model_path to hf_model_id) * /home/user/cache_root changed to /home/container_app_user/cache_root * fix get_devices_mounts, add mapping * use MODEL_ID if in container env_vars to map to impl model config * set defaults for ModelImpl * add configs for llama 3.x models * remove HF_TOKEN from tt-studio .env for ease of setup * add environment file processing

bgoelTT

Just had one tiny comment. Will start testing now

app/api/docker_control/docker_utils.py

app/api/shared_config/model_config.py

bgoelTT · 2025-01-31T16:48:32Z

Oh btw the https://github.com/orgs/tenstorrent/packages/container/package/tt-inference-server%2Fvllm-llama3-src-dev-ubuntu-20.04-amd64 container is private so I needed to login to ghcr. First experienced a pull error when trying to run llama-3.2-1B so then I examined the logs.

anirudTT

LGTM! works for running 3.1 70B instruct , WIll test on more of the llama herd next week.

anirudTT · 2025-01-31T20:30:19Z

Oh btw the https://github.com/orgs/tenstorrent/packages/container/package/tt-inference-server%2Fvllm-llama3-src-dev-ubuntu-20.04-amd64 container is private so I needed to login to ghcr. First experienced a pull error when trying to run llama-3.2-1B so then I examined the logs.

I did not run into this , not sure why

bgoelTT

The instructions in HowToRun_vLLM_Models.md also need to be updated to:

reflect changes to the new .env procedures (no more HF_TOKEN and no more copying the file)
reflect how the process needs to be done for the Llama herd

Other than that I was able to run all the new Llama herds! (didn't try 3.3-70B)

tstescoTT · 2025-01-31T21:37:27Z

Oh btw the https://github.com/orgs/tenstorrent/packages/container/package/tt-inference-server%2Fvllm-llama3-src-dev-ubuntu-20.04-amd64 container is private so I needed to login to ghcr. First experienced a pull error when trying to run llama-3.2-1B so then I examined the logs.

I did not run into this , not sure why

It should be public now.

tstescoTT · 2025-01-31T21:48:22Z

The instructions in HowToRun_vLLM_Models.md also need to be updated to:
* reflect changes to the new .env procedures (no more HF_TOKEN and no more copying the file)

* reflect how the process needs to be done for the Llama herd
Other than that I was able to run all the new Llama herds! (didn't try 3.3-70B)

Added an issue to track this #165, we can add that in another PR to dev.

bgoelTT

Sounds good re #165

* remove HF token fron .env in tt-studio * startup.sh makes HOST_PERSISTENT_STORAGE_VOLUME if it doesnt exist * startup.sh uses safety set -euo pipefail * remove HF_TOKEN from app/docker-compose.yml * remove VLLM_LLAMA31_ENV_FILE now redundant * Adding Llama 3.x integration using new setup.sh and LLM code base * support multiple models using same container, adds support for MODEL_ID environment variable in tt-inference-server. * update volume initialization for new file permissions strategy * add SetupTypes to handle different first run and validation behaviour * hf_model_id is used to define model_id and model_name if provided (rename hf_model_path to hf_model_id) * /home/user/cache_root changed to /home/container_app_user/cache_root * fix get_devices_mounts, add mapping * use MODEL_ID if in container env_vars to map to impl model config * set defaults for ModelImpl * add configs for llama 3.x models * remove HF_TOKEN from tt-studio .env for ease of setup * add environment file processing

* Llama 3.x integration using new setup.sh and LLM code base (#164) * remove HF token fron .env in tt-studio * startup.sh makes HOST_PERSISTENT_STORAGE_VOLUME if it doesnt exist * startup.sh uses safety set -euo pipefail * remove HF_TOKEN from app/docker-compose.yml * remove VLLM_LLAMA31_ENV_FILE now redundant * Adding Llama 3.x integration using new setup.sh and LLM code base * support multiple models using same container, adds support for MODEL_ID environment variable in tt-inference-server. * update volume initialization for new file permissions strategy * add SetupTypes to handle different first run and validation behaviour * hf_model_id is used to define model_id and model_name if provided (rename hf_model_path to hf_model_id) * /home/user/cache_root changed to /home/container_app_user/cache_root * fix get_devices_mounts, add mapping * use MODEL_ID if in container env_vars to map to impl model config * set defaults for ModelImpl * add configs for llama 3.x models * remove HF_TOKEN from tt-studio .env for ease of setup * add environment file processing * adds license headers * Anirud/update vllm setup steps (#189) * update readme to reflect new flow * fix readme issues * add Supported models tab: - pointing to tt-inference-server readme * docs: Update main readme - add better quick start guide - add better notes for running in development mode * docs: re add Mock model steps * docs: fix links * docs: fix vllm * Update HowToRun_vLLM_Models.md * Update HowToRun_vLLM_Models.md Co-authored-by: Tom Stesco <tstesco@tenstorrent.com> Co-authored-by: Benjamin Goel <bgoel@tenstorrent.com>

tstescoTT added 6 commits January 29, 2025 15:25

remove HF token fron .env in tt-studio

c1be68f

startup.sh makes HOST_PERSISTENT_STORAGE_VOLUME if it doesnt exist

aed2b29

startup.sh uses safety set -euo pipefail

3a55bbb

remove HF_TOKEN from app/docker-compose.yml

4fa20ee

remove VLLM_LLAMA31_ENV_FILE now redundant

ea577a7

tstescoTT requested review from anirudTT and bgoelTT January 31, 2025 05:38

bgoelTT reviewed Jan 31, 2025

View reviewed changes

app/api/docker_control/docker_utils.py Show resolved Hide resolved

app/api/shared_config/model_config.py Show resolved Hide resolved

anirudTT approved these changes Jan 31, 2025

View reviewed changes

bgoelTT reviewed Jan 31, 2025

View reviewed changes

tstescoTT mentioned this pull request Jan 31, 2025

update HowToRun_vLLM_Models.md #165

Closed

bgoelTT approved these changes Jan 31, 2025

View reviewed changes

tstescoTT merged commit f820f6f into dev Jan 31, 2025

anirudTT mentioned this pull request Feb 5, 2025

rc v1.2.2 #173

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.x integration using new setup.sh and LLM code base #164

Llama 3.x integration using new setup.sh and LLM code base #164

tstescoTT commented Jan 31, 2025

bgoelTT left a comment

bgoelTT commented Jan 31, 2025 •

edited

Loading

anirudTT left a comment

anirudTT commented Jan 31, 2025

bgoelTT left a comment •

edited

Loading

tstescoTT commented Jan 31, 2025

tstescoTT commented Jan 31, 2025 •

edited

Loading

bgoelTT left a comment

Llama 3.x integration using new setup.sh and LLM code base #164

Llama 3.x integration using new setup.sh and LLM code base #164

Conversation

tstescoTT commented Jan 31, 2025

change log

bgoelTT left a comment

Choose a reason for hiding this comment

bgoelTT commented Jan 31, 2025 • edited Loading

anirudTT left a comment

Choose a reason for hiding this comment

anirudTT commented Jan 31, 2025

bgoelTT left a comment • edited Loading

Choose a reason for hiding this comment

tstescoTT commented Jan 31, 2025

tstescoTT commented Jan 31, 2025 • edited Loading

bgoelTT left a comment

Choose a reason for hiding this comment

bgoelTT commented Jan 31, 2025 •

edited

Loading

bgoelTT left a comment •

edited

Loading

tstescoTT commented Jan 31, 2025 •

edited

Loading