Add documentation for vLLM usage (jupyterlab#1232)

* Added documentation on use of `vLLM` for model deployments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Note to upgrade JAI for use of vllm --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
jtpio · Feb 6, 2025 · fbc4895 · fbc4895
1 parent 94699a8
commit fbc4895
Show file tree

Hide file tree

Showing 6 changed files with 48 additions and 0 deletions.
diff --git a/docs/source/_static/vllm-aisettings.png b/docs/source/_static/vllm-aisettings.png
diff --git a/docs/source/_static/vllm-api.png b/docs/source/_static/vllm-api.png
diff --git a/docs/source/_static/vllm-chat.png b/docs/source/_static/vllm-chat.png
diff --git a/docs/source/_static/vllm-serve.png b/docs/source/_static/vllm-serve.png
diff --git a/docs/source/users/index.md b/docs/source/users/index.md
@@ -441,6 +441,10 @@ models.
 
 To get started, follow the instructions on the [Ollama website](https://ollama.com/) to set up `ollama` and download the models locally. To select a model, enter the model name in the settings panel, for example `deepseek-coder-v2`.
 
+### vLLM usage
+
+`vLLM` is a fast and easy-to-use library for LLM inference and serving. The [vLLM website](https://docs.vllm.ai/en/latest/) explains installation and usage. To use `vLLM` in Jupyter AI, please see the dedicated documentation page on using [vLLM in Jupyter AI](vllm.md).
+
 ### Asking about something in your notebook
 
 Jupyter AI's chat interface can include a portion of your notebook in your prompt.

diff --git a/docs/source/users/vllm.md b/docs/source/users/vllm.md
@@ -0,0 +1,44 @@
+# Using vLLM in Jupyter AI
+
+[(Return to the Chat Interface page)](index.md#vllm-usage)
+
+`vLLM` is a fast and easy-to-use library for LLM inference and serving. The [vLLM website](https://docs.vllm.ai/en/latest/) explains installation and usage.
+
+:::{note}
+To use `vLLM` via `OpenRouter` as described below you will need to upgrade to `jupyter-ai >= 2.29.1`.
+:::
+
+
+Depending on your hardware set up you will install `vLLM` using these [instructions](https://docs.vllm.ai/en/latest/getting_started/installation/index.html). It is best to install it in a dedicated python environment.
+
+Once it is installed you may start serving any model with the command:
+```python
+vllm serve <model_name>
+```
+As an example, the deployment of the `Phi-3-mini-4k-instruct` model is shown below, with checks to make sure it is up and running:
+
+<img src="../_static/vllm-serve.png"
+    alt="Screen shot of steps and checks in deploying a model using vllm."
+        class="screenshot" />
+
+`vllm` serves up the model at the following URL: `http://<url>:8000/v1`
+
+Start up Jupyter AI and update the AI Settings as follows (notice that we are using [OpenRouter](openrouter.md) as the provider, which is a unified interface for LLMs based on OpenAI's API interface):
+
+<img src="../_static/vllm-aisettings.png"
+    alt="Screen shot of AI setting for using vllm."
+    class="screenshot"  width="75%"/>
+
+Since vLLM may be addressed using OpenAI's API, you can test if the model is available using the API call as shown:
+
+<img src="../_static/vllm-api.png"
+    alt="Screen shot of using vllm programmatically with its API."
+    class="screenshot" />
+
+The model may be used in Jupyter AI's chat interface as shown in the example below:
+
+<img src="../_static/vllm-chat.png"
+    alt="Screen shot of using vllm in Jupyter AI chat."
+    class="screenshot"  width="75%"/>
+
+[(Return to the Chat Interface page)](index.md#vllm-usage)