[cherry-pick] Release 0.12.1 cherry pick (#343)

* SparseServer.UI example [WIP] (#308) * add megasparse dir * edited readme to support DS integration * small edit to docstring * edited settings.py module so only 2 models are loaded. * added more context to the readme about adding more models. * fixed image * added different host to streamlit client as default * quality check edits * quality check commit * passing copyright quality check * content edits * rename dir to sparseserver-ui * added new config file for quick start * edited multipipelineclient in settings.py * changed name of config file * edited model stubs * edited readme * added dependency pins * changed server pin * edited model choice logic * altered front-end features * style update * renamed samples file * added variant descriptions * style changes * edited samples.py module * style changes * added new pic * edited README * updated readme cmds * SparseServer edit (#314) * edit number of models * edit settings.py * edit readme * Update label mapping for deepsparse.transformers.eval_downstream (#323) * Update label mapping for deepsparse.transformers.eval_downstream * Fix MNLI as well * bump up main to 0.13.0 (#313) Co-authored-by: dhuang <dhuang@dhuangs-MacBook-Pro.local> * AWS Sagemaker example integration (#305) * AWS Sagemaker example integration * documentation, sample config, dockerfile fixes * fix ecr repo name * readme code changes from testing * Update huggingface-transformers/README.md with new models (#329) * Update README.md (#330) * Update README.md various grammatical edits additional edits for section headline consistency * Topology file for HB120rs_v3 (#334) * Topology file for HB120rs_v3 Specifies core-per-CCX grouping for HB120rs_v3 VM's, used by multi-process script. * Update README.md to reference Azure topo file * Move all benchmarking within deepsparse/benchmark/ (#333) * Move all benchmarking within deepsparse/benchmark/ * Update benchmark_model * Expose results at benchmark base * isort * Skip flake8 * server integration check bug fix (#331) * server integration check bug fix need to verify integration is set before calling `integration.lower()` * respond to review - click choice * add default integration val to server config schema (#337) * deepsparse.Pipeline - generic pipeline, deepsparse.server support, NLP,IC,OD pipelines (#317) * base commit - make pydantic a general req * Pipeline base class implementation (#315) * Pipeline base class implementation * constructor default values * __call__ inputs/outputs parsing + validation * documentation * pipeline 'alias' argument * review fixes * [feature/Pipeline] PipelineConfig (#318) * PipelineConfig pydantic model + Pipeline.from_config * Pipeline.to_config() function * [feature/Pipeline] refactor deepsparse.server to use deepsparse.Pipeline (#319) * PipelineConfig pydantic model + Pipeline.from_config * Pipeline.to_config() function * refactor deepsparse.server to use deepsparse.Pipeline * review nit fix remove files for separate feature * Image Classification Pipeline Integration (#322) * Create a command line installable for image classification pipeline * Intermediate Commit * Image Classification pipeline implementation * Remove faulty entry point * Apply suggestions from @bogunowicz * Changed function name from `_infer_input_shape` to `_infer_image_shape` * Add validation script for Image Classification pipeline (#328) * Add Validation Script for Image Classification Models * Update pipelines and corresponding schemas to work with numpy arrays * Bugfix if prediction to be converted to int if it's a string * Update docstring * Update src/deepsparse/image_classification/validation_script.py * [feature/Pipeline] fixes for ic-pipelines implementation (#336) * fixes for ic-pipelines implementation * sparsezoo support Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> * Update src/deepsparse/pipeline.py * quality * [feature/Pipeline] deepsparse.Pipeline implementations for transformers (#320) * add parsing layer for deepsparse.Pipeline for implementation flexibility * initial deepsparse.transformers.pipelines implementation base class + text_classification * make tokenizer and config attributes 'public', but not properties * decorator fix * use kwargs for transforemrs pipeline parent class args * token classification impl * token classification output schema parsing * quality * question answering pipeline impl * fixes for pipline impls - bs1 santity check inferences working * [feature/Pipeline] deprecate and migrate existing transformers pipelines (#335) * remove old pipeline pathway and files, add API for deprecated pathway * migrate eval_downstream * update readme * server pipeline input fix * hf license attribution * `YOLO` pipeline integration for deepsparse (#327) * Added YOLO pipeline Add an installable for yolo integration Added a task for YOLO To install run: * `pip install --editable "./[yolo]"` * Changed function name from `_infer_input_shape` to `_infer_image_shape` * Update docstring * Comments from @bogunowicz * Moved COCO classes to a file * Adds support to annotate images using YOLO (#332) * Adds support to annotate images using YOLO * Makes `YOLOOutput` iterable * Returns a named tuple of image outputs when `next` is called on `YOLOOutput` * Adds an annotate function to yolo utils * Adds an annotation script, testing + minor fixes remain * Intermediate-commit * Intermediate WIP * Working State with required bugfixes * style fixes Co-authored-by: Benjamin <ben@neuralmagic.com> Co-authored-by: Benjamin <ben@neuralmagic.com> * [feature/Pipeline] rename input/output _model to _schema (#340) * rename input/output _model to _schema * refactor yolo pipeline * default model support for Pipeline.register (#339) * default model support for Pipeline.register * update default stubs for transformers and IC * yolo default model * minor fixes * model->schema for server Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> * Remove: startlette dep (#338) * Update src/deepsparse/version.py Co-authored-by: Ricky Costa <79061523+InquestGeronimo@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: dhuangnm <74931910+dhuangnm@users.noreply.github.com> Co-authored-by: dhuang <dhuang@dhuangs-MacBook-Pro.local> Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com> Co-authored-by: Govind Ramnarayan <77341216+govindr-nm@users.noreply.github.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com> Co-authored-by: Konstantin Gulin <66528950+KSGulin@users.noreply.github.com>
neuralmagic · May 2, 2022 · ddf0ed6 · ddf0ed6
1 parent 01a427a
commit ddf0ed6
Show file tree

Hide file tree

Showing 56 changed files with 4,861 additions and 1,811 deletions.
diff --git a/README.md b/README.md
@@ -139,12 +139,12 @@ deepsparse.benchmark [-h] [-b BATCH_SIZE] [-shapes INPUT_SHAPES]
 ## 👩‍💻 NLP Inference Example
 
 ```python
-from deepsparse.transformers import pipeline
+from deepsparse import Pipeline
 
 # SparseZoo model stub or path to ONNX file
 model_path = "zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/12layer_pruned80_quant-none-vnni"
 
-qa_pipeline = pipeline(
+qa_pipeline = Pipeline.create(
     task="question-answering",
     model_path=model_path,
 )

diff --git a/examples/amd-azure/HB120rs_v3.json b/examples/amd-azure/HB120rs_v3.json
@@ -0,0 +1,16 @@
+[[24,25,26,27,28,29],
+[0,1,2,3,4,5,6,7],
+[8,9,10,11,12,13,14,15],
+[16,17,18,19,20,21,22,23],
+[54,55,56,57,58,59],
+[30,31,32,33,34,35,36,37],
+[38,39,40,41,42,43,44,45],
+[46,47,48,49,50,51,52,53],
+[84,85,86,87,88,89],
+[60,61,62,63,64,65,66,67],
+[68,69,70,71,72,73,74,75],
+[76,77,78,79,80,81,82,83],
+[114,115,116,117,118,119],
+[90,91,92,93,94,95,96,97],
+[98,99,100,101,102,103,104,105],
+[106,107,108,109,110,111,112,113]]
diff --git a/examples/amd-azure/README.md b/examples/amd-azure/README.md
@@ -46,7 +46,11 @@ For this benchmarking script, users must specify the topology of their system wi
 
 One list of cores will contain the processor IDs that one of the worker processes will run on, and should reflect the topology of the system. For performance, one list of cores in the JSON topology file should contain the list of cores that are on the same socket, are on the same NUMA node, or share the same L3 cache. 
 
-The `/examples/amd-azure` directory contains an example JSON file that can be used. `amd_epyc_7713.json` is suitable for a two-socket system with AMD EPYC 7713 processors. This file will also work for a one-socket system if the proper parameter for `nstreams` is passed into `multi_process_benchmark.py`.
+The `/examples/amd-azure` directory contains two example JSON files that can be used. 
+
+`amd_epyc_7713.json` is suitable for a two-socket system with AMD EPYC 7713 processors. This file will also work for a one-socket system if the proper parameter for `nstreams` is passed into `multi_process_benchmark.py`. 
+
+`HB120rs_v3.json` is suitable for an Azure HB120rs_v3 virtual machine, a two-socket machine with AMD Milan-X processors and 120 cores in total. You may notice that not every process will use the same number of cores when using this topology. This is because some of the CCXs on this instance type have some cores dedicated to running the hypervisor.
 
 ## Usage
 

diff --git a/examples/amd-azure/multi_process_benchmark.py b/examples/amd-azure/multi_process_benchmark.py
@@ -23,7 +23,7 @@
 
 import numa
 from deepsparse import compile_model
-from deepsparse.benchmark_model.stream_benchmark import singlestream_benchmark
+from deepsparse.benchmark.stream_benchmark import singlestream_benchmark
 from deepsparse.log import set_logging_level
 from deepsparse.utils import (
     generate_random_inputs,

diff --git a/examples/aws-sagemaker/Dockerfile b/examples/aws-sagemaker/Dockerfile
@@ -0,0 +1,33 @@
+FROM python:3.8
+
+ARG config_path=./config.yaml
+
+USER root
+
+RUN apt-get -qq -y update && \
+    apt-get -qq -y upgrade && \
+    apt-get -y autoclean && \
+    apt-get -y autoremove && \
+    rm -rf /var/lib/apt/lists/*
+
+
+COPY ${config_path} /root/server-config.yaml
+
+ENV VIRTUAL_ENV=/venv
+ENV PATH="$VIRTUAL_ENV/bin:$PATH"
+
+
+RUN python3 -m venv $VIRTUAL_ENV && \
+    pip3 install --no-cache-dir --upgrade pip && \
+    pip3 install --no-cache-dir "deepsparse-nightly[server]"  # TODO: switch to deepsparse[server] >= 0.12
+
+# create 'serve' command for sagemaker entrypoint
+RUN mkdir /opt/server/
+RUN echo "#! /bin/bash" > /opt/server/serve
+RUN echo "deepsparse.server --port 8080 --config_file /root/server-config.yaml" >> /opt/server/serve
+RUN chmod 777 /opt/server/serve
+
+ENV PATH="/opt/server:${PATH}"
+WORKDIR /opt/server
+
+ENTRYPOINT ["bash", "/opt/server/serve"]
diff --git a/examples/aws-sagemaker/README.md b/examples/aws-sagemaker/README.md
@@ -0,0 +1,266 @@
+<!--
+Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Deploying DeepSparse with Amazon SageMaker
+
+[Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/index.html)
+offers an easy-to-use infrastructure for deploying deep learning models at scale.
+This directory provides a guided example for deploying a 
+[DeepSparse](https://github.com/neuralmagic/deepsparse) inference server on SageMaker.
+Deployments benefit from both sparse-CPU acceleration with
+DeepSparse and automatic scaling from SageMaker.
+
+
+## Contents
+In addition to the step-by-step instructions in this guide, the directory contains
+additional files to aid in the deployment.
+
+### Dockerfile
+The included `Dockerfile` builds an image on top of the standard `python:3.8` image
+with `deepsparse` installed and creates an executable command `serve` that runs
+`deepsparse.server` on port 8080. SageMaker will execute this image by running
+`docker run serve` and expects the image to serve inference requests at the
+`invocations/` endpoint.
+
+For general customization of the server, changes should not need to be made
+to the Dockerfile, but to the `config.yaml` file that the Dockerfile reads from
+instead.
+
+### config.yaml
+`config.yaml` is used to configure the DeepSparse server running in the Dockerfile.
+The config must contain the line `integration: sagemaker` so
+endpoints may be provisioned correctly to match SageMaker specifications.
+
+Notice that the `model_path` and `task` are set to run a sparse-quantized
+question-answering model from [SparseZoo](https://sparsezoo.neuralmagic.com/).
+To use a model directory stored in `s3`, set `model_path` to `/opt/ml/model` in
+the config and add `ModelDataUrl=<MODEL-S3-PATH>` to the `CreateModel` arguments.
+SageMaker will automatically copy the files from the s3 path into `/opt/ml/model`
+which the server can then read from.
+
+More information on the DeepSparse server and its configuration can be found
+[here](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server#readme).
+
+
+## Deploying to SageMaker
+The following steps are required to provision and deploy DeepSparse to SageMaker
+for inference:
+* Build the DeepSparse-SageMaker `Dockerfile` into a local docker image
+* Create an [Amazon ECR](https://aws.amazon.com/ecr/) repository to host the image
+* Push the image to the ECR repository
+* Create a SageMaker `Model` that reads from the hosted ECR image
+* Build a SageMaker `EndpointConfig` that defines how to provision the model deployment
+* Launch the SageMaker `Endpoint` defined by the `Model` and `EndpointConfig`
+
+### Requirements
+The listed steps can be easily completed using a `python` and `bash`. The following
+credentials, tools, and libraries are also required:
+* The [`aws` cli](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) that is [configured](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html)
+* The [ARN of an AWS role](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) your user has access to that has full SageMaker and ECR permissions. In the following steps, we will refer to this as `ROLE_ARN`. It should take the form `"arn:aws:iam::XXX:role/service-role/XXX"`
+* [Docker and the `docker` cli](https://docs.docker.com/get-docker/)
+* The `boto3` python AWS sdk (`pip install boto3`)
+
+### Building the DeepSparse-SageMaker Image Locally
+The `Dockerfile` can be build from this directory from a bash shell using the following command.
+The image will be tagged locally as `deepsparse-sagemaker-example`.
+
+```bash
+docker build -t deepsparse-sagemaker-example .
+```
+
+### Creating an ECR Repository
+The following code snippet can be used in Python to create an ECR repository.
+The `region_name` can be swapped to a preferred region. The repository will be named
+`deepsparse-sagemaker`.  If the repository is already created, this step may be skipped.
+
+```python
+import boto3
+
+ecr = boto3.client("ecr", region_name='us-east-1')
+create_repository_res = ecr.create_repository(repositoryName="deepsparse-sagemaker")
+```
+
+### Pushing the Local Image to the ECR Repository
+Once the image is built and the ECR repository is created, the image can be pushed using the following
+bash commands.
+
+```bash
+account=$(aws sts get-caller-identity --query Account | sed -e 's/^"//' -e 's/"$//')
+region=$(aws configure get region)
+ecr_account=${account}.dkr.ecr.${region}.amazonaws.com
+
+aws ecr get-login-password --region $region | docker login --username AWS --password-stdin $ecr_account
+fullname=$ecr_account/deepsparse-sagemaker:latest
+
+docker tag deepsparse-sagemaker-example:latest $fullname
+docker push $fullname
+```
+
+An abbreviated successful output will look like:
+```
+Login Succeeded
+The push refers to repository [XXX.dkr.ecr.us-east-1.amazonaws.com/deepsparse-example]
+3c2284f66840: Preparing
+08fa02ce37eb: Preparing
+a037458de4e0: Preparing
+bafdbe68e4ae: Preparing
+a13c519c6361: Preparing
+6817758dd480: Waiting
+6d95196cbe50: Waiting
+e9872b0f234f: Waiting
+c18b71656bcf: Waiting
+2174eedecc00: Waiting
+03ea99cd5cd8: Pushed
+585a375d16ff: Pushed
+5bdcc8e2060c: Pushed
+latest: digest: sha256:XXX size: 3884
+```
+
+### Creating a SageMaker Model
+A SageMaker `Model` can now be created referencing the pushed image.
+The example model will be named `question-answering-example`.
+As mentioned in the requirements, `ROLE_ARN` should be a string arn of an AWS
+role with full access to SageMaker.
+
+```python
+sm_boto3 = boto3.client("sagemaker", region_name="us-east-1")
+
+region = boto3.Session().region_name
+account_id = boto3.client("sts").get_caller_identity()["Account"]
+
+image_uri = "{}.dkr.ecr.{}.amazonaws.com/deepsparse-sagemaker:latest".format(account_id, region)
+
+create_model_res = sm_boto3.create_model(
+    ModelName="question-answering-example",
+    Containers=[
+        {
+            "Image": image_uri,
+        },
+    ],
+    ExecutionRoleArn=ROLE_ARN,
+    EnableNetworkIsolation=False,
+)
+```
+
+More information about options for configuring SageMaker `Model` instances can
+be found [here](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html).
+
+
+### Building a SageMaker EndpointConfig
+The `EndpointConfig` is used to set the instance type to provision, how many, scaling
+rules, and other deployment settings.  The following code snippet defines an endpoint
+with a single machine using an `ml.c5.large` CPU.
+
+* [Full list of available instances](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-available-instance-types.html) (See Compute optimized (no GPUs) section)
+* [EndpointConfig documentation and options](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateEndpointConfig.html)
+
+```python
+model_name = "question-answering-example"  # model defined above
+initial_instance_count = 1
+instance_type = "ml.c5.large"
+
+variant_name = "QuestionAnsweringDeepSparseDemo"  # ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}
+
+production_variants = [
+    {
+        "VariantName": variant_name,
+        "ModelName": model_name,
+        "InitialInstanceCount": initial_instance_count,
+        "InstanceType": instance_type,
+    }
+]
+
+endpoint_config_name = "QuestionAnsweringExampleConfig"  # ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}
+
+endpoint_config = {
+    "EndpointConfigName": endpoint_config_name,
+    "ProductionVariants": production_variants,
+}
+
+endpoint_config_res = sm_boto3.create_endpoint_config(**endpoint_config)
+```
+
+### Launching a SageMaker Endpoint
+Once the `EndpointConfig` is defined, the endpoint can be easily launched using
+the `create_endpoint` command:
+
+```python
+endpoint_name = "question-answering-example-endpoint"
+endpoint_res = sm_boto3.create_endpoint(
+    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name
+)
+```
+
+After creating the endpoint, its status can be checked by running the following.
+Initially, the `EndpointStatus` will be `Creating`. Checking after the image is
+successfully launched, it will be `InService`. If there are any errors, it will 
+become `Failed`.
+
+```python
+from pprint import pprint
+pprint(sm_boto3.describe_endpoint(EndpointName=endpoint_name))
+```
+
+
+## Making a Request to the Endpoint
+After the endpoint is in service, requests can be made to it through the
+`invoke_endpoint` api. Inputs will be passed as a JSON payload.
+
+```python
+import json
+
+sm_runtime = boto3.client("sagemaker-runtime", region_name="us-east-1")
+
+body = json.dumps(
+    dict(
+        question="Where do I live?",
+        context="I am a student and I live in Cambridge",
+    )
+)
+
+content_type = "application/json"
+accept = "text/plain"
+
+res = sm_runtime.invoke_endpoint(
+    EndpointName=endpoint_name,
+    Body=body,
+    ContentType=content_type,
+    Accept=accept,
+)
+
+print(res["Body"].readlines())
+```
+
+
+### Cleanup
+The model and endpoint can be deleted with the following commands:
+```python
+sm_boto3.delete_endpoint(EndpointName=endpoint_name)
+sm_boto3.delete_endpoint_config(EndpointConfigName=endpoint_config_name)
+sm_boto3.delete_model(ModelName=model_name)
+```
+
+## Next Steps
+These steps create an invokable SageMaker inference endpoint powered by the DeepSparse
+Engine.  The `EndpointConfig` settings may be adjusted to set instance scaling rules based
+on deployment needs.
+
+More information on deploying custom models with SageMaker can be found
+[here](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html).
+
+Open an [issue](https://github.com/neuralmagic/deepsparse/issues)
+or reach out to the [DeepSparse community](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ)
+with any issues, questions, or ideas.
diff --git a/examples/aws-sagemaker/config.yaml b/examples/aws-sagemaker/config.yaml
@@ -0,0 +1,5 @@
+models:
+    - task: question_answering
+      model_path: zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant-moderate
+      batch_size: 1
+integration: sagemaker
diff --git a/examples/benchmark/check_correctness.py b/examples/benchmark/check_correctness.py
@@ -45,7 +45,7 @@
 import argparse
 
 from deepsparse import compile_model, cpu
-from deepsparse.benchmark_model.ort_engine import ORTEngine
+from deepsparse.benchmark.ort_engine import ORTEngine
 from deepsparse.utils import (
     generate_random_inputs,
     model_to_path,

diff --git a/examples/benchmark/run_benchmark.py b/examples/benchmark/run_benchmark.py
@@ -145,13 +145,13 @@ def main():
         inputs, num_iterations, num_warmup_iterations, include_outputs=True
     )
 
-    for dse_output, ort_output in zip(dse_results.outputs, ort_results.outputs):
-        verify_outputs(dse_output, ort_output)
-
     print("ONNXRuntime", ort_results)
     print()
     print("DeepSparse Engine", dse_results)
 
+    for dse_output, ort_output in zip(dse_results.outputs, ort_results.outputs):
+        verify_outputs(dse_output, ort_output)
+
 
 if __name__ == "__main__":
     main()