Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3.x integration using new setup.sh and LLM code base #164

Merged
merged 6 commits into from
Jan 31, 2025

Conversation

tstescoTT
Copy link
Contributor

change log

  • support multiple models using same container, adds support for MODEL_ID environment variable in tt-inference-server.
  • update volume initialization for new file permissions strategy
  • add SetupTypes to handle different first run and validation behaviour
  • hf_model_id is used to define model_id and model_name if provided (rename hf_model_path to hf_model_id)
  • /home/user/cache_root changed to /home/container_app_user/cache_root
  • fix get_devices_mounts, add mapping
  • use MODEL_ID if in container env_vars to map to impl model config
  • set defaults for ModelImpl
  • add configs for llama 3.x models
  • remove HF_TOKEN from tt-studio .env for ease of setup
  • add environment file processing

* support multiple models using same container, adds support for MODEL_ID environment variable in tt-inference-server.
* update volume initialization for new file permissions strategy
* add SetupTypes to handle different first run and validation behaviour
* hf_model_id is used to define model_id and model_name if provided (rename hf_model_path to hf_model_id)
* /home/user/cache_root changed to /home/container_app_user/cache_root
* fix get_devices_mounts, add mapping
* use MODEL_ID if in container env_vars to map to impl model config
* set defaults for ModelImpl
* add configs for llama 3.x models
* remove HF_TOKEN from tt-studio .env for ease of setup
* add environment file processing
@tstescoTT tstescoTT requested review from anirudTT and bgoelTT January 31, 2025 05:38
Copy link
Contributor

@bgoelTT bgoelTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had one tiny comment. Will start testing now

@bgoelTT
Copy link
Contributor

bgoelTT commented Jan 31, 2025

Oh btw the https://github.com/orgs/tenstorrent/packages/container/package/tt-inference-server%2Fvllm-llama3-src-dev-ubuntu-20.04-amd64 container is private so I needed to login to ghcr. First experienced a pull error when trying to run llama-3.2-1B so then I examined the logs.

Copy link
Contributor

@anirudTT anirudTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! works for running 3.1 70B instruct , WIll test on more of the llama herd next week.

@anirudTT
Copy link
Contributor

Oh btw the https://github.com/orgs/tenstorrent/packages/container/package/tt-inference-server%2Fvllm-llama3-src-dev-ubuntu-20.04-amd64 container is private so I needed to login to ghcr. First experienced a pull error when trying to run llama-3.2-1B so then I examined the logs.

I did not run into this , not sure why

Copy link
Contributor

@bgoelTT bgoelTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instructions in HowToRun_vLLM_Models.md also need to be updated to:

  • reflect changes to the new .env procedures (no more HF_TOKEN and no more copying the file)
  • reflect how the process needs to be done for the Llama herd

Other than that I was able to run all the new Llama herds! (didn't try 3.3-70B)

@tstescoTT
Copy link
Contributor Author

Oh btw the https://github.com/orgs/tenstorrent/packages/container/package/tt-inference-server%2Fvllm-llama3-src-dev-ubuntu-20.04-amd64 container is private so I needed to login to ghcr. First experienced a pull error when trying to run llama-3.2-1B so then I examined the logs.

I did not run into this , not sure why

It should be public now.

@tstescoTT
Copy link
Contributor Author

tstescoTT commented Jan 31, 2025

The instructions in HowToRun_vLLM_Models.md also need to be updated to:

* reflect changes to the new .env procedures (no more HF_TOKEN and no more copying the file)

* reflect how the process needs to be done for the Llama herd

Other than that I was able to run all the new Llama herds! (didn't try 3.3-70B)

Added an issue to track this #165, we can add that in another PR to dev.

Copy link
Contributor

@bgoelTT bgoelTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good re #165

@tstescoTT tstescoTT merged commit f820f6f into dev Jan 31, 2025
anirudTT pushed a commit that referenced this pull request Feb 5, 2025
* remove HF token fron .env in tt-studio
* startup.sh makes HOST_PERSISTENT_STORAGE_VOLUME if it doesnt exist
* startup.sh uses safety set -euo pipefail
* remove HF_TOKEN from app/docker-compose.yml
* remove VLLM_LLAMA31_ENV_FILE now redundant
* Adding Llama 3.x integration using new setup.sh and LLM code base
* support multiple models using same container, adds support for MODEL_ID environment variable in tt-inference-server.
* update volume initialization for new file permissions strategy
* add SetupTypes to handle different first run and validation behaviour
* hf_model_id is used to define model_id and model_name if provided (rename hf_model_path to hf_model_id)
* /home/user/cache_root changed to /home/container_app_user/cache_root
* fix get_devices_mounts, add mapping
* use MODEL_ID if in container env_vars to map to impl model config
* set defaults for ModelImpl
* add configs for llama 3.x models
* remove HF_TOKEN from tt-studio .env for ease of setup
* add environment file processing
@anirudTT anirudTT mentioned this pull request Feb 5, 2025
anirudTT added a commit that referenced this pull request Feb 24, 2025
* Llama 3.x integration using new setup.sh and LLM code base (#164)
* remove HF token fron .env in tt-studio
* startup.sh makes HOST_PERSISTENT_STORAGE_VOLUME if it doesnt exist
* startup.sh uses safety set -euo pipefail
* remove HF_TOKEN from app/docker-compose.yml
* remove VLLM_LLAMA31_ENV_FILE now redundant
* Adding Llama 3.x integration using new setup.sh and LLM code base
* support multiple models using same container, adds support for MODEL_ID environment variable in tt-inference-server.
* update volume initialization for new file permissions strategy
* add SetupTypes to handle different first run and validation behaviour
* hf_model_id is used to define model_id and model_name if provided (rename hf_model_path to hf_model_id)
* /home/user/cache_root changed to /home/container_app_user/cache_root
* fix get_devices_mounts, add mapping
* use MODEL_ID if in container env_vars to map to impl model config
* set defaults for ModelImpl
* add configs for llama 3.x models
* remove HF_TOKEN from tt-studio .env for ease of setup
* add environment file processing

* adds license headers

* Anirud/update vllm setup steps (#189)

* update readme to reflect new flow

* fix readme issues
* add Supported models tab:
  - pointing to tt-inference-server readme
* docs: Update main readme
- add better quick start guide 
- add better notes for running in development mode
* docs: re add Mock model steps
* docs: fix links
* docs: fix vllm
* Update HowToRun_vLLM_Models.md
* Update HowToRun_vLLM_Models.md

Co-authored-by: Tom Stesco <tstesco@tenstorrent.com>
Co-authored-by: Benjamin Goel <bgoel@tenstorrent.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants