This README provides a guide for deploying a basic ResNet model (ONNX format) on the Triton Inference Server.
This quickstart guide is an extended version of the official tutorial available at triton-inference-server/tutorials/Quick_Deploy/ONNX/README.md. The official tutorial might be a bit succinct, especially for those new to the Triton Inference Server, so this guide aims to offer more detailed steps to make the deployment process more accessible.
If you're using Linux or MacOS, you can follow this quickstart using your terminal. However, for Windows OS users, please note that CMD will not work. Instead, you should use Windows PowerShell.
Follow the Docker installation instructions that are tailored to your particular operating system. You can access comprehensive step-by-step guide here.
To perform inference on your model with Triton, it's necessary to create a model repository.
The structure of the repository should be:
<model-repository-path>/
<model-name>/
[config.pbtxt]
[<output-labels-file> ...]
<version>/
<model-definition-file>
<version>/
<model-definition-file>
...
<model-name>/
[config.pbtxt]
[<output-labels-file> ...]
<version>/
<model-definition-file>
<version>/
<model-definition-file>
...
...
(The config.pbtxt
configuration file is optional. The configuration file will be autogenerated by Triton Inference Server if the user doesn't provide it.)
Therefore, the first step is to set up the directory structure for the model repository.
mkdir -p model_repository/densenet_onnx/1
Next, download the example ResNet model available online and place it in the appropriate directory.
wget -O model_repository/densenet_onnx/1/model.onnx "https://contentmamluswest001.blob.core.windows.net/content/14b2744cf8d6418c87ffddc3f3127242/9502630827244d60a1214f250e3bbca7/08aed7327d694b8dbaee2c97b8d0fcba/densenet121-1.2.onnx"
Now, by entering the command tree
, you will observe the following directory structure, which aligns with the specifications of the model repository structure.
model_repository
|
+-- densenet_onnx
|
+-- 1
|
+-- model.onnx
Please ensure that your current directory is located one level above the newly created model repository. This arrangement will allow the path ./model_repository
to refer to the actual model repository. If your current location is not in the desired location, navigate to that directory.
Next, run the pre-built docker container for Trition Inference Server
docker run --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v "${PWD}/model_repository:/models" nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models
If you encounter a permission error, prepend sudo
to the command. In case Triton Inference Server version 23.06 is not available, you can refer to the official release notes to identify the available versions.
Once the Docker image has been successfully pulled and the container is up and running, you should see a significant amount of information displayed. Within, you can find:
+---------------+---------+--------+
| Model | Version | Status |
+---------------+---------+--------+
| densenet_onnx | 1 | READY |
+---------------+---------+--------+
This indicates our model has been deployed on the server and is now ready to perform inference.
Since the server needs to be up and running when client queries it, after initiating the server's Docker container, do not close that terminal as it could stop the container.
To set up the client container, please utilize a separate terminal, distinct from the one employed for setting up the Triton server.
Run the pre-built docker container for Trition Client
docker run -it --rm --net=host -v "${PWD}:/workspace/" nvcr.io/nvidia/tritonserver:23.06-py3-sdk bash
If you encounter a permission error, prepend sudo
to the command. In case Triton Inference Server version 23.06 is not available, you can refer to the official release notes to identify the available versions.
Once the Docker image has been successfully pulled and the container is up and running, you will find yourself in an interactive Bash shell session within the container.
Install the torchvision
package.
pip install torchvision
Download the example photo for conducting inference.
wget -O img1.jpg "https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg"
Download the example python script client.py
for querying
wget -O client.py "https://raw.githubusercontent.com/Achiwilms/NVIDIA-Triton-Deployment-Quickstart/main/client.py"
Execute the client.py
script, then the inference request will be sent.
python client.py
Once the inference process is finished and the results are sent back to the client, the result will be printed. The output format here is <confidence_score>:<classification_index>
.
['11.549026:92' '11.232335:14' '7.528014:95' '6.923391:17' '6.576575:88']
To learn more about the request-making process, you can explore the client.py file. The comments within the script provide guidance and explanations that will aid you in navigating through it.
You've successfully deployed a model on the Triton Inference Server. Congratulations! 🎉
On the other hand, if you encounter any challenges at any step, please feel free to contact me for assistance at this email address.