forked from jupyterlab/jupyter-ai
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add documentation for vLLM usage (jupyterlab#1232)
* Added documentation on use of `vLLM` for model deployments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Note to upgrade JAI for use of vllm --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information
1 parent
94699a8
commit fbc4895
Showing
6 changed files
with
48 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Using vLLM in Jupyter AI | ||
|
||
[(Return to the Chat Interface page)](index.md#vllm-usage) | ||
|
||
`vLLM` is a fast and easy-to-use library for LLM inference and serving. The [vLLM website](https://docs.vllm.ai/en/latest/) explains installation and usage. | ||
|
||
:::{note} | ||
To use `vLLM` via `OpenRouter` as described below you will need to upgrade to `jupyter-ai >= 2.29.1`. | ||
::: | ||
|
||
|
||
Depending on your hardware set up you will install `vLLM` using these [instructions](https://docs.vllm.ai/en/latest/getting_started/installation/index.html). It is best to install it in a dedicated python environment. | ||
|
||
Once it is installed you may start serving any model with the command: | ||
```python | ||
vllm serve <model_name> | ||
``` | ||
As an example, the deployment of the `Phi-3-mini-4k-instruct` model is shown below, with checks to make sure it is up and running: | ||
|
||
<img src="../_static/vllm-serve.png" | ||
alt="Screen shot of steps and checks in deploying a model using vllm." | ||
class="screenshot" /> | ||
|
||
`vllm` serves up the model at the following URL: `http://<url>:8000/v1` | ||
|
||
Start up Jupyter AI and update the AI Settings as follows (notice that we are using [OpenRouter](openrouter.md) as the provider, which is a unified interface for LLMs based on OpenAI's API interface): | ||
|
||
<img src="../_static/vllm-aisettings.png" | ||
alt="Screen shot of AI setting for using vllm." | ||
class="screenshot" width="75%"/> | ||
|
||
Since vLLM may be addressed using OpenAI's API, you can test if the model is available using the API call as shown: | ||
|
||
<img src="../_static/vllm-api.png" | ||
alt="Screen shot of using vllm programmatically with its API." | ||
class="screenshot" /> | ||
|
||
The model may be used in Jupyter AI's chat interface as shown in the example below: | ||
|
||
<img src="../_static/vllm-chat.png" | ||
alt="Screen shot of using vllm in Jupyter AI chat." | ||
class="screenshot" width="75%"/> | ||
|
||
[(Return to the Chat Interface page)](index.md#vllm-usage) |