-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llama 3.x integration using new setup.sh and LLM code base #164
Conversation
* support multiple models using same container, adds support for MODEL_ID environment variable in tt-inference-server. * update volume initialization for new file permissions strategy * add SetupTypes to handle different first run and validation behaviour * hf_model_id is used to define model_id and model_name if provided (rename hf_model_path to hf_model_id) * /home/user/cache_root changed to /home/container_app_user/cache_root * fix get_devices_mounts, add mapping * use MODEL_ID if in container env_vars to map to impl model config * set defaults for ModelImpl * add configs for llama 3.x models * remove HF_TOKEN from tt-studio .env for ease of setup * add environment file processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had one tiny comment. Will start testing now
Oh btw the https://github.com/orgs/tenstorrent/packages/container/package/tt-inference-server%2Fvllm-llama3-src-dev-ubuntu-20.04-amd64 container is private so I needed to login to ghcr. First experienced a pull error when trying to run llama-3.2-1B so then I examined the logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! works for running 3.1 70B instruct , WIll test on more of the llama herd next week.
I did not run into this , not sure why |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instructions in HowToRun_vLLM_Models.md also need to be updated to:
- reflect changes to the new .env procedures (no more HF_TOKEN and no more copying the file)
- reflect how the process needs to be done for the Llama herd
Other than that I was able to run all the new Llama herds! (didn't try 3.3-70B)
It should be public now. |
Added an issue to track this #165, we can add that in another PR to dev. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good re #165
* remove HF token fron .env in tt-studio * startup.sh makes HOST_PERSISTENT_STORAGE_VOLUME if it doesnt exist * startup.sh uses safety set -euo pipefail * remove HF_TOKEN from app/docker-compose.yml * remove VLLM_LLAMA31_ENV_FILE now redundant * Adding Llama 3.x integration using new setup.sh and LLM code base * support multiple models using same container, adds support for MODEL_ID environment variable in tt-inference-server. * update volume initialization for new file permissions strategy * add SetupTypes to handle different first run and validation behaviour * hf_model_id is used to define model_id and model_name if provided (rename hf_model_path to hf_model_id) * /home/user/cache_root changed to /home/container_app_user/cache_root * fix get_devices_mounts, add mapping * use MODEL_ID if in container env_vars to map to impl model config * set defaults for ModelImpl * add configs for llama 3.x models * remove HF_TOKEN from tt-studio .env for ease of setup * add environment file processing
* Llama 3.x integration using new setup.sh and LLM code base (#164) * remove HF token fron .env in tt-studio * startup.sh makes HOST_PERSISTENT_STORAGE_VOLUME if it doesnt exist * startup.sh uses safety set -euo pipefail * remove HF_TOKEN from app/docker-compose.yml * remove VLLM_LLAMA31_ENV_FILE now redundant * Adding Llama 3.x integration using new setup.sh and LLM code base * support multiple models using same container, adds support for MODEL_ID environment variable in tt-inference-server. * update volume initialization for new file permissions strategy * add SetupTypes to handle different first run and validation behaviour * hf_model_id is used to define model_id and model_name if provided (rename hf_model_path to hf_model_id) * /home/user/cache_root changed to /home/container_app_user/cache_root * fix get_devices_mounts, add mapping * use MODEL_ID if in container env_vars to map to impl model config * set defaults for ModelImpl * add configs for llama 3.x models * remove HF_TOKEN from tt-studio .env for ease of setup * add environment file processing * adds license headers * Anirud/update vllm setup steps (#189) * update readme to reflect new flow * fix readme issues * add Supported models tab: - pointing to tt-inference-server readme * docs: Update main readme - add better quick start guide - add better notes for running in development mode * docs: re add Mock model steps * docs: fix links * docs: fix vllm * Update HowToRun_vLLM_Models.md * Update HowToRun_vLLM_Models.md Co-authored-by: Tom Stesco <tstesco@tenstorrent.com> Co-authored-by: Benjamin Goel <bgoel@tenstorrent.com>
change log