Skip to content

Commit

Permalink
[cherry-pick] Release 0.12.1 cherry pick (#343)
Browse files Browse the repository at this point in the history
* SparseServer.UI example [WIP] (#308)

* add megasparse dir

* edited readme to support DS integration

* small edit to docstring

* edited settings.py module so only 2 models are loaded.

* added more context to the readme about adding more models.

* fixed image

* added different  host  to streamlit client as default

* quality check edits

* quality check commit

* passing copyright quality check

* content edits

* rename dir to sparseserver-ui

* added new config file for quick start

* edited multipipelineclient in settings.py

* changed name of config file

* edited model stubs

* edited readme

* added dependency pins

* changed server pin

* edited model choice logic

* altered front-end features

* style update

* renamed samples file

* added variant descriptions

* style changes

* edited samples.py module

* style changes

* added new pic

* edited README

* updated readme cmds

* SparseServer edit (#314)

* edit number of models

* edit settings.py

* edit readme

* Update label mapping for deepsparse.transformers.eval_downstream (#323)

* Update label mapping for deepsparse.transformers.eval_downstream

* Fix MNLI as well

* bump up main to 0.13.0 (#313)

Co-authored-by: dhuang <dhuang@dhuangs-MacBook-Pro.local>

* AWS Sagemaker example integration (#305)

* AWS Sagemaker example integration

* documentation, sample config, dockerfile fixes

* fix ecr repo name

* readme code changes from testing

* Update huggingface-transformers/README.md with new models (#329)

* Update README.md (#330)

* Update README.md

various grammatical edits
additional edits for section headline consistency

* Topology file for HB120rs_v3 (#334)

* Topology file for HB120rs_v3

Specifies core-per-CCX grouping for HB120rs_v3 VM's, used by multi-process script.

* Update README.md to reference Azure topo file

* Move all benchmarking within deepsparse/benchmark/ (#333)

* Move all benchmarking within deepsparse/benchmark/

* Update benchmark_model

* Expose results at benchmark base

* isort

* Skip flake8

* server integration check bug fix (#331)

* server integration check bug fix

need to verify integration is set before calling `integration.lower()`

* respond to review - click choice

* add default integration val to server config schema (#337)

* deepsparse.Pipeline - generic pipeline, deepsparse.server support, NLP,IC,OD pipelines (#317)

* base commit - make pydantic a general req

* Pipeline base class implementation (#315)

* Pipeline base class implementation

* constructor default values

* __call__ inputs/outputs parsing + validation

* documentation

* pipeline 'alias' argument

* review fixes

* [feature/Pipeline] PipelineConfig (#318)

* PipelineConfig pydantic model + Pipeline.from_config

* Pipeline.to_config() function

* [feature/Pipeline] refactor deepsparse.server to use deepsparse.Pipeline (#319)

* PipelineConfig pydantic model + Pipeline.from_config

* Pipeline.to_config() function

* refactor deepsparse.server to use deepsparse.Pipeline

* review nit fix

remove files for separate feature

* Image Classification Pipeline Integration (#322)

* Create a command line installable for image classification pipeline

* Intermediate Commit

* Image Classification pipeline implementation

* Remove faulty entry point

* Apply suggestions from @bogunowicz


* Changed function  name from `_infer_input_shape` to `_infer_image_shape`

* Add validation script for Image Classification pipeline (#328)

* Add Validation Script for Image Classification Models

* Update pipelines and corresponding schemas to work with numpy arrays

* Bugfix if prediction to be converted to int if it's a string

* Update docstring

* Update src/deepsparse/image_classification/validation_script.py

* [feature/Pipeline] fixes for ic-pipelines implementation (#336)

* fixes for ic-pipelines implementation

* sparsezoo support

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>

* Update src/deepsparse/pipeline.py


* quality

* [feature/Pipeline] deepsparse.Pipeline implementations for transformers (#320)

* add parsing layer for deepsparse.Pipeline for implementation flexibility

* initial deepsparse.transformers.pipelines implementation base class + text_classification

* make tokenizer and config attributes 'public', but not properties

* decorator fix

* use kwargs for transforemrs pipeline parent class args

* token classification impl

* token classification output schema parsing

* quality

* question answering pipeline impl

* fixes for pipline impls - bs1 santity check inferences working

* [feature/Pipeline] deprecate and migrate existing transformers pipelines (#335)

* remove old pipeline pathway and files, add API for deprecated pathway

* migrate eval_downstream

* update readme

* server pipeline input fix

* hf license attribution

* `YOLO` pipeline integration for deepsparse (#327)

* Added YOLO pipeline
Add an installable for yolo integration
Added a task for YOLO

To install run:
* `pip install --editable "./[yolo]"`

* Changed function  name from `_infer_input_shape` to `_infer_image_shape`

* Update docstring

* Comments from @bogunowicz
* Moved COCO classes to a file

* Adds support to annotate images using YOLO (#332)

* Adds support to annotate images using YOLO

* Makes `YOLOOutput` iterable
* Returns a named tuple of image outputs when `next` is called on `YOLOOutput`
* Adds an annotate function to yolo utils

* Adds an annotation script, testing + minor fixes remain

* Intermediate-commit

* Intermediate WIP

* Working State with required bugfixes

* style fixes

Co-authored-by: Benjamin <ben@neuralmagic.com>

Co-authored-by: Benjamin <ben@neuralmagic.com>

* [feature/Pipeline] rename input/output _model to _schema (#340)

* rename input/output _model to _schema

* refactor yolo pipeline

* default model support for Pipeline.register (#339)

* default model support for Pipeline.register

* update default stubs for transformers and IC

* yolo default model

* minor fixes

* model->schema for server

Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>

* Remove: startlette dep (#338)

* Update src/deepsparse/version.py

Co-authored-by: Ricky Costa <79061523+InquestGeronimo@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com>
Co-authored-by: dhuang <dhuang@dhuangs-MacBook-Pro.local>
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com>
Co-authored-by: Govind Ramnarayan <77341216+govindr-nm@users.noreply.github.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
  • Loading branch information
10 people authored May 2, 2022
1 parent 01a427a commit ddf0ed6
Show file tree
Hide file tree
Showing 56 changed files with 4,861 additions and 1,811 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,12 +139,12 @@ deepsparse.benchmark [-h] [-b BATCH_SIZE] [-shapes INPUT_SHAPES]
## 👩‍💻 NLP Inference Example

```python
from deepsparse.transformers import pipeline
from deepsparse import Pipeline
# SparseZoo model stub or path to ONNX file
model_path = "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni"
qa_pipeline = pipeline(
qa_pipeline = Pipeline.create(
task="question-answering",
model_path=model_path,
)
Expand Down
16 changes: 16 additions & 0 deletions examples/amd-azure/HB120rs_v3.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[[24,25,26,27,28,29],
[0,1,2,3,4,5,6,7],
[8,9,10,11,12,13,14,15],
[16,17,18,19,20,21,22,23],
[54,55,56,57,58,59],
[30,31,32,33,34,35,36,37],
[38,39,40,41,42,43,44,45],
[46,47,48,49,50,51,52,53],
[84,85,86,87,88,89],
[60,61,62,63,64,65,66,67],
[68,69,70,71,72,73,74,75],
[76,77,78,79,80,81,82,83],
[114,115,116,117,118,119],
[90,91,92,93,94,95,96,97],
[98,99,100,101,102,103,104,105],
[106,107,108,109,110,111,112,113]]
6 changes: 5 additions & 1 deletion examples/amd-azure/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,11 @@ For this benchmarking script, users must specify the topology of their system wi

One list of cores will contain the processor IDs that one of the worker processes will run on, and should reflect the topology of the system. For performance, one list of cores in the JSON topology file should contain the list of cores that are on the same socket, are on the same NUMA node, or share the same L3 cache.

The `/examples/amd-azure` directory contains an example JSON file that can be used. `amd_epyc_7713.json` is suitable for a two-socket system with AMD EPYC 7713 processors. This file will also work for a one-socket system if the proper parameter for `nstreams` is passed into `multi_process_benchmark.py`.
The `/examples/amd-azure` directory contains two example JSON files that can be used.

`amd_epyc_7713.json` is suitable for a two-socket system with AMD EPYC 7713 processors. This file will also work for a one-socket system if the proper parameter for `nstreams` is passed into `multi_process_benchmark.py`.

`HB120rs_v3.json` is suitable for an Azure HB120rs_v3 virtual machine, a two-socket machine with AMD Milan-X processors and 120 cores in total. You may notice that not every process will use the same number of cores when using this topology. This is because some of the CCXs on this instance type have some cores dedicated to running the hypervisor.

## Usage

Expand Down
2 changes: 1 addition & 1 deletion examples/amd-azure/multi_process_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@

import numa
from deepsparse import compile_model
from deepsparse.benchmark_model.stream_benchmark import singlestream_benchmark
from deepsparse.benchmark.stream_benchmark import singlestream_benchmark
from deepsparse.log import set_logging_level
from deepsparse.utils import (
generate_random_inputs,
Expand Down
33 changes: 33 additions & 0 deletions examples/aws-sagemaker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
FROM python:3.8

ARG config_path=./config.yaml

USER root

RUN apt-get -qq -y update && \
apt-get -qq -y upgrade && \
apt-get -y autoclean && \
apt-get -y autoremove && \
rm -rf /var/lib/apt/lists/*


COPY ${config_path} /root/server-config.yaml

ENV VIRTUAL_ENV=/venv
ENV PATH="$VIRTUAL_ENV/bin:$PATH"


RUN python3 -m venv $VIRTUAL_ENV && \
pip3 install --no-cache-dir --upgrade pip && \
pip3 install --no-cache-dir "deepsparse-nightly[server]" # TODO: switch to deepsparse[server] >= 0.12

# create 'serve' command for sagemaker entrypoint
RUN mkdir /opt/server/
RUN echo "#! /bin/bash" > /opt/server/serve
RUN echo "deepsparse.server --port 8080 --config_file /root/server-config.yaml" >> /opt/server/serve
RUN chmod 777 /opt/server/serve

ENV PATH="/opt/server:${PATH}"
WORKDIR /opt/server

ENTRYPOINT ["bash", "/opt/server/serve"]
266 changes: 266 additions & 0 deletions examples/aws-sagemaker/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,266 @@
<!--
Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Deploying DeepSparse with Amazon SageMaker

[Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/index.html)
offers an easy-to-use infrastructure for deploying deep learning models at scale.
This directory provides a guided example for deploying a
[DeepSparse](https://github.com/neuralmagic/deepsparse) inference server on SageMaker.
Deployments benefit from both sparse-CPU acceleration with
DeepSparse and automatic scaling from SageMaker.


## Contents
In addition to the step-by-step instructions in this guide, the directory contains
additional files to aid in the deployment.

### Dockerfile
The included `Dockerfile` builds an image on top of the standard `python:3.8` image
with `deepsparse` installed and creates an executable command `serve` that runs
`deepsparse.server` on port 8080. SageMaker will execute this image by running
`docker run serve` and expects the image to serve inference requests at the
`invocations/` endpoint.

For general customization of the server, changes should not need to be made
to the Dockerfile, but to the `config.yaml` file that the Dockerfile reads from
instead.

### config.yaml
`config.yaml` is used to configure the DeepSparse server running in the Dockerfile.
The config must contain the line `integration: sagemaker` so
endpoints may be provisioned correctly to match SageMaker specifications.

Notice that the `model_path` and `task` are set to run a sparse-quantized
question-answering model from [SparseZoo](https://sparsezoo.neuralmagic.com/).
To use a model directory stored in `s3`, set `model_path` to `/opt/ml/model` in
the config and add `ModelDataUrl=<MODEL-S3-PATH>` to the `CreateModel` arguments.
SageMaker will automatically copy the files from the s3 path into `/opt/ml/model`
which the server can then read from.

More information on the DeepSparse server and its configuration can be found
[here](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server#readme).


## Deploying to SageMaker
The following steps are required to provision and deploy DeepSparse to SageMaker
for inference:
* Build the DeepSparse-SageMaker `Dockerfile` into a local docker image
* Create an [Amazon ECR](https://aws.amazon.com/ecr/) repository to host the image
* Push the image to the ECR repository
* Create a SageMaker `Model` that reads from the hosted ECR image
* Build a SageMaker `EndpointConfig` that defines how to provision the model deployment
* Launch the SageMaker `Endpoint` defined by the `Model` and `EndpointConfig`

### Requirements
The listed steps can be easily completed using a `python` and `bash`. The following
credentials, tools, and libraries are also required:
* The [`aws` cli](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) that is [configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html)
* The [ARN of an AWS role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) your user has access to that has full SageMaker and ECR permissions. In the following steps, we will refer to this as `ROLE_ARN`. It should take the form `"arn:aws:iam::XXX:role/service-role/XXX"`
* [Docker and the `docker` cli](https://docs.docker.com/get-docker/)
* The `boto3` python AWS sdk (`pip install boto3`)

### Building the DeepSparse-SageMaker Image Locally
The `Dockerfile` can be build from this directory from a bash shell using the following command.
The image will be tagged locally as `deepsparse-sagemaker-example`.

```bash
docker build -t deepsparse-sagemaker-example .
```

### Creating an ECR Repository
The following code snippet can be used in Python to create an ECR repository.
The `region_name` can be swapped to a preferred region. The repository will be named
`deepsparse-sagemaker`. If the repository is already created, this step may be skipped.

```python
import boto3

ecr = boto3.client("ecr", region_name='us-east-1')
create_repository_res = ecr.create_repository(repositoryName="deepsparse-sagemaker")
```

### Pushing the Local Image to the ECR Repository
Once the image is built and the ECR repository is created, the image can be pushed using the following
bash commands.

```bash
account=$(aws sts get-caller-identity --query Account | sed -e 's/^"//' -e 's/"$//')
region=$(aws configure get region)
ecr_account=${account}.dkr.ecr.${region}.amazonaws.com

aws ecr get-login-password --region $region | docker login --username AWS --password-stdin $ecr_account
fullname=$ecr_account/deepsparse-sagemaker:latest

docker tag deepsparse-sagemaker-example:latest $fullname
docker push $fullname
```

An abbreviated successful output will look like:
```
Login Succeeded
The push refers to repository [XXX.dkr.ecr.us-east-1.amazonaws.com/deepsparse-example]
3c2284f66840: Preparing
08fa02ce37eb: Preparing
a037458de4e0: Preparing
bafdbe68e4ae: Preparing
a13c519c6361: Preparing
6817758dd480: Waiting
6d95196cbe50: Waiting
e9872b0f234f: Waiting
c18b71656bcf: Waiting
2174eedecc00: Waiting
03ea99cd5cd8: Pushed
585a375d16ff: Pushed
5bdcc8e2060c: Pushed
latest: digest: sha256:XXX size: 3884
```

### Creating a SageMaker Model
A SageMaker `Model` can now be created referencing the pushed image.
The example model will be named `question-answering-example`.
As mentioned in the requirements, `ROLE_ARN` should be a string arn of an AWS
role with full access to SageMaker.

```python
sm_boto3 = boto3.client("sagemaker", region_name="us-east-1")

region = boto3.Session().region_name
account_id = boto3.client("sts").get_caller_identity()["Account"]

image_uri = "{}.dkr.ecr.{}.amazonaws.com/deepsparse-sagemaker:latest".format(account_id, region)

create_model_res = sm_boto3.create_model(
ModelName="question-answering-example",
Containers=[
{
"Image": image_uri,
},
],
ExecutionRoleArn=ROLE_ARN,
EnableNetworkIsolation=False,
)
```

More information about options for configuring SageMaker `Model` instances can
be found [here](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html).


### Building a SageMaker EndpointConfig
The `EndpointConfig` is used to set the instance type to provision, how many, scaling
rules, and other deployment settings. The following code snippet defines an endpoint
with a single machine using an `ml.c5.large` CPU.

* [Full list of available instances](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-available-instance-types.html) (See Compute optimized (no GPUs) section)
* [EndpointConfig documentation and options](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpointConfig.html)

```python
model_name = "question-answering-example" # model defined above
initial_instance_count = 1
instance_type = "ml.c5.large"

variant_name = "QuestionAnsweringDeepSparseDemo" # ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}

production_variants = [
{
"VariantName": variant_name,
"ModelName": model_name,
"InitialInstanceCount": initial_instance_count,
"InstanceType": instance_type,
}
]

endpoint_config_name = "QuestionAnsweringExampleConfig" # ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}

endpoint_config = {
"EndpointConfigName": endpoint_config_name,
"ProductionVariants": production_variants,
}

endpoint_config_res = sm_boto3.create_endpoint_config(**endpoint_config)
```

### Launching a SageMaker Endpoint
Once the `EndpointConfig` is defined, the endpoint can be easily launched using
the `create_endpoint` command:

```python
endpoint_name = "question-answering-example-endpoint"
endpoint_res = sm_boto3.create_endpoint(
EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
)
```

After creating the endpoint, its status can be checked by running the following.
Initially, the `EndpointStatus` will be `Creating`. Checking after the image is
successfully launched, it will be `InService`. If there are any errors, it will
become `Failed`.

```python
from pprint import pprint
pprint(sm_boto3.describe_endpoint(EndpointName=endpoint_name))
```


## Making a Request to the Endpoint
After the endpoint is in service, requests can be made to it through the
`invoke_endpoint` api. Inputs will be passed as a JSON payload.

```python
import json

sm_runtime = boto3.client("sagemaker-runtime", region_name="us-east-1")

body = json.dumps(
dict(
question="Where do I live?",
context="I am a student and I live in Cambridge",
)
)

content_type = "application/json"
accept = "text/plain"

res = sm_runtime.invoke_endpoint(
EndpointName=endpoint_name,
Body=body,
ContentType=content_type,
Accept=accept,
)

print(res["Body"].readlines())
```


### Cleanup
The model and endpoint can be deleted with the following commands:
```python
sm_boto3.delete_endpoint(EndpointName=endpoint_name)
sm_boto3.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
sm_boto3.delete_model(ModelName=model_name)
```

## Next Steps
These steps create an invokable SageMaker inference endpoint powered by the DeepSparse
Engine. The `EndpointConfig` settings may be adjusted to set instance scaling rules based
on deployment needs.

More information on deploying custom models with SageMaker can be found
[here](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html).

Open an [issue](https://github.com/neuralmagic/deepsparse/issues)
or reach out to the [DeepSparse community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)
with any issues, questions, or ideas.
5 changes: 5 additions & 0 deletions examples/aws-sagemaker/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
models:
- task: question_answering
model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant-moderate
batch_size: 1
integration: sagemaker
2 changes: 1 addition & 1 deletion examples/benchmark/check_correctness.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
import argparse

from deepsparse import compile_model, cpu
from deepsparse.benchmark_model.ort_engine import ORTEngine
from deepsparse.benchmark.ort_engine import ORTEngine
from deepsparse.utils import (
generate_random_inputs,
model_to_path,
Expand Down
6 changes: 3 additions & 3 deletions examples/benchmark/run_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,13 +145,13 @@ def main():
inputs, num_iterations, num_warmup_iterations, include_outputs=True
)

for dse_output, ort_output in zip(dse_results.outputs, ort_results.outputs):
verify_outputs(dse_output, ort_output)

print("ONNXRuntime", ort_results)
print()
print("DeepSparse Engine", dse_results)

for dse_output, ort_output in zip(dse_results.outputs, ort_results.outputs):
verify_outputs(dse_output, ort_output)


if __name__ == "__main__":
main()
Loading

0 comments on commit ddf0ed6

Please sign in to comment.