From 4c7c6f315851894cf37ae865d1f564c6e2bb8e21 Mon Sep 17 00:00:00 2001
From: dhuangnm <dhuang@MacBook-Pro-2.local>
Date: Mon, 22 Jul 2024 14:15:33 -0400
Subject: [PATCH] Add README.md contents

---
 README.md | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8130431..1b35651 100644
--- a/README.md
+++ b/README.md
@@ -1 +1,47 @@
-# nm-vllm-certs
\ No newline at end of file
+# nm-vllm-certs
+
+## Overview
+
+The `nm-vllm` packages published in this repository are Neuralmagic Enterprise Edition of [vLLM](https://github.com/vllm-project/vllm). Packages are versioned Python wheels and docker images. These are released as "production level" official releases and "beta level" Nightly's.
+
+Official releases are made at the discretion of Neuralmagic, but typically track with `vllm` releases. These wheels are available via "public pypi" as well as ["nm-pypi"](https://pypi.neuralmagic.com).
+
+Nightly's are released every night given green runs in automation. The wheels are available at ["nm-pypi"](https://pypi.neuralmagic.com).
+
+## Installation
+
+### PyPI
+The [nm-vllm PyPi package](https://pypi.neuralmagic.com/simple/nm-vllm/index.html) includes pre-compiled binaries for CUDA (version 12.1) kernels. For other PyTorch or CUDA versions, please compile the package from source.
+
+Install it using pip:
+```bash
+pip install nm-vllm --extra-index-url https://pypi.neuralmagic.com/simple
+```
+
+To utilize the weight sparsity features, include the optional `sparse` dependencies.
+```bash
+pip install nm-vllm[sparse] --extra-index-url https://pypi.neuralmagic.com/simple
+```
+
+You can also build and install `nm-vllm` from source (this will take ~10 minutes):
+```bash
+git clone https://github.com/neuralmagic/nm-vllm.git
+cd nm-vllm
+pip install -e .[sparse] --extra-index-url https://pypi.neuralmagic.com/simple
+
+### Docker
+
+The [`nm-vllm-ent` container registry](https://github.com/neuralmagic/nm-vllm-certs/pkgs/container/nm-vllm-ent) includes premade docker images.
+
+Launch the OpenAI-compatible server with:
+
+```bash
+MODEL_ID=Qwen/Qwen2-0.5B-Instruct
+docker run --gpus all --shm-size 2g ghcr.io/neuralmagic/nm-vllm-ent:latest --model $MODEL_ID
+```
+
+## Models
+
+Neural Magic maintains a variety of optimized models on our Hugging Face organization profiles:
+- [neuralmagic](https://huggingface.co/neuralmagic)
+- [nm-testing](https://huggingface.co/nm-testing)