Skip to content

Commit

Permalink
Rebase release/0.1 off of main for 0.1.1 (#53)
Browse files Browse the repository at this point in the history
* Update icon name (#41)

* Update index.rst (#43)

adding license chip for nme; substituting in sparseml for deepsparse value in shields.io until we can debug why deepsparse won't work.

* Update README.md (#44)

removed comingsoon referene and adding our active repo for shields.io badge

* Makefile build argument for nightly builds (#45)

* docs updates (#46)

- correcting double-slash URL issue
- enhancing left nav for Help
- misc content updates

* Update for 0.1.1 release (#49)

- update python version to 0.1.1
- setup.py add in version parts and _VERSION_MAJOR_MINOR for more flexibility with dependencies between neural magic packages

* add compile engine from SparseZoo stub and benchmarking progress bar (#48)

* Sparsification update (#51)

* Sparsification update
- update sparsification descriptions and move to preferred verbage
- update classification examples to resnet50

* update from comments

* Update README.md

Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com>

* Update README.md

Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com>

* Update README.md

Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com>

* Update README.md

Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com>

* Update README.md

Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com>

* Update README.md

Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com>

* Update docs/source/quicktour.md

Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com>

* update for changes found in sparsezoo

* fix links in index.rst for reviewd content

* update component overview and tagline from doc

* update from comments

Co-authored-by: Jeannie Finks <74554921+jeanniefinks@users.noreply.github.com>

* Add resnet50 benchmark script (#52)

* Add resnet50 benchmark script

* Add Ben's warning and ISA print

* Remove old file

* Batch splitting (#50)

* Add support for batch splitting

* Update for 0.1.1 release (#49)

- update python version to 0.1.1
- setup.py add in version parts and _VERSION_MAJOR_MINOR for more flexibility with dependencies between neural magic packages

* Add support for batch splitting

* Remove use_batch_splitting parameter

* Run style on changes

Co-authored-by: Mark Kurtz <mark@neuralmagic.com>

* Rm: blog link, 7x (#54)

* Update README.md and index.rst

Temporarily removing references to 7x and blog. Will return after it's live.

Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Jeannie Finks (NM) <74554921+jeanniefinks@users.noreply.github.com>
Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
  • Loading branch information
5 people authored Feb 26, 2021
1 parent 6036701 commit 0b35b2c
Show file tree
Hide file tree
Showing 12 changed files with 467 additions and 141 deletions.
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
.PHONY: build docs test

BUILDDIR := $(PWD)
BUILD_ARGS := # set nightly to build nightly release
CHECKDIRS := examples tests src utils notebooks setup.py
PYCHECKGLOBS := 'examples/**/*.py' 'scripts/**/*.py' 'src/**/*.py' 'tests/**/*.py' 'utils/**/*.py' setup.py
DOCDIR := docs
Expand Down Expand Up @@ -43,7 +44,7 @@ docs:

# creates wheel file
build:
python3 setup.py sdist bdist_wheel
python3 setup.py sdist bdist_wheel $(BUILD_ARGS)

# clean package
clean:
Expand Down
103 changes: 68 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,13 @@ limitations under the License.

# ![icon for DeepSparse](https://raw.githubusercontent.com/neuralmagic/deepsparse/main/docs/source/icon-deepsparse.png) DeepSparse Engine

### CPU inference engine that delivers unprecedented performance for sparse models
### Neural network inference engine that delivers GPU-class performance for sparsified models on CPUs

<br>
<p>
<a href="https://github.com/neuralmagic/deepsparse/blob/main/LICENSE-NEURALMAGIC"><img alt="GitHub" src="https://img.shields.io/static/v1.svg?label=LICENSE&message=neural%20magic%20engine&color=purple&style=for-the-badge" height=25>
</a>
<a href="https://github.com/neuralmagic/deepsparse/blob/main/LICENSE">
<img alt="GitHub" src="https://img.shields.io/github/license/neuralmagic/comingsoon.svg?color=purple&style=for-the-badge" height=25>
<img alt="GitHub" src="https://img.shields.io/static/v1.svg?label=LICENSE&message=apache-2.0&color=purple&style=for-the-badge" height=25>
</a>
<a href="https://docs.neuralmagic.com/deepsparse/">
<img alt="Documentation" src="https://img.shields.io/website/http/docs.neuralmagic.com/deepsparse/index.html.svg?down_color=red&down_message=offline&up_message=online&style=for-the-badge" height=25>
Expand All @@ -47,87 +46,121 @@ limitations under the License.

## Overview

The DeepSparse Engine is a CPU runtime that delivers unprecedented performance by taking advantage of natural sparsity within neural networks to reduce compute required as well as accelerate memory bound workloads. It is focused on model deployment and scaling machine learning pipelines, fitting seamlessly into your existing deployments as an inference backend.
The DeepSparse Engine is a CPU runtime that delivers GPU-class performance by taking advantage of sparsity within neural networks to reduce compute required as well as accelerate memory bound workloads.
It is focused on model deployment and scaling machine learning pipelines, fitting seamlessly into your existing deployments as an inference backend.

This repository includes package APIs along with examples to quickly get started learning about and actually running sparse models.
This repository includes package APIs along with examples to quickly get started benchmarking and inferencing sparse models.

### Related Products
## Sparsification

- [SparseZoo](https://github.com/neuralmagic/sparsezoo):
Neural network model repository for highly sparse models and optimization recipes
- [SparseML](https://github.com/neuralmagic/sparseml):
Libraries for state-of-the-art deep neural network optimization algorithms,
enabling simple pipelines integration with a few lines of code
- [Sparsify](https://github.com/neuralmagic/sparsify):
Easy-to-use autoML interface to optimize deep neural networks for
better inference performance and a smaller footprint
Sparsification is the process of taking a trained deep learning model and removing redundant information from the overprecise and over-parameterized network resulting in a faster and smaller model.
Techniques for sparsification are all encompassing including everything from inducing sparsity using [pruning](https://neuralmagic.com/blog/pruning-overview/) and [quantization](https://arxiv.org/abs/1609.07061) to enabling naturally occurring sparsity using [activation sparsity](http://proceedings.mlr.press/v119/kurtz20a.html) or [winograd/FFT](https://arxiv.org/abs/1509.09308).
When implemented correctly, these techniques result in significantly more performant and smaller models with limited to no effect on the baseline metrics.
For example, pruning plus quantization can give noticeable improvements in performance while recovering to nearly the same baseline accuracy.

The Deep Sparse product suite builds on top of sparsification enabling you to easily apply the techniques to your datasets and models using recipe-driven approaches.
Recipes encode the directions for how to sparsify a model into a simple, easily editable format.
- Download a sparsification recipe and sparsified model from the [SparseZoo](https://github.com/neuralmagic/sparsezoo).
- Alternatively, create a recipe for your model using [Sparsify](https://github.com/neuralmagic/sparsify).
- Apply your recipe with only a few lines of code using [SparseML](https://github.com/neuralmagic/sparseml).
- Finally, for GPU-level performance on CPUs, deploy your sparse-quantized model with the [DeepSparse Engine](https://github.com/neuralmagic/deepsparse).


**Full Deep Sparse product flow:**

<img src="https://docs.neuralmagic.com/docs/source/sparsification/flow-overview.svg" width="960px">

## Compatibility

The DeepSparse Engine ingests models in the [ONNX](https://onnx.ai/) format, allowing for compatibility with [PyTorch](https://pytorch.org/docs/stable/onnx.html), [TensorFlow](https://github.com/onnx/tensorflow-onnx), [Keras](https://github.com/onnx/keras-onnx), and [many other frameworks](https://github.com/onnx/onnxmltools) that support it. This reduces the extra work of preparing your trained model for inference to just one step of exporting.

## Quick Tour

To expedite inference and benchmarking on real models, we include the `sparsezoo` package. [SparseZoo](https://github.com/neuralmagic/sparsezoo) hosts inference optimized models, trained on repeatable optimization recipes using state-of-the-art techniques from [SparseML](https://github.com/neuralmagic/sparseml).
To expedite inference and benchmarking on real models, we include the `sparsezoo` package. [SparseZoo](https://github.com/neuralmagic/sparsezoo) hosts inference-optimized models, trained on repeatable sparsification recipes using state-of-the-art techniques from [SparseML](https://github.com/neuralmagic/sparseml).

### Quickstart with SparseZoo ONNX Models

**MobileNetV1 Dense**
**ResNet-50 Dense**

Here is how to quickly perform inference with DeepSparse Engine on a pre-trained dense MobileNetV1 from SparseZoo.
Here is how to quickly perform inference with DeepSparse Engine on a pre-trained dense ResNet-50 from SparseZoo.

```python
from deepsparse import compile_model
from sparsezoo.models import classification

batch_size = 64

# Download model and compile as optimized executable for your machine
model = classification.mobilenet_v1()
model = classification.resnet_50()
engine = compile_model(model, batch_size=batch_size)

# Fetch sample input and predict output using engine
inputs = model.data_inputs.sample_batch(batch_size=batch_size)
outputs, inference_time = engine.timed_run(inputs)
```

**MobileNetV1 Optimized**
**ResNet-50 Sparsified**

When exploring available optimized models, you can use the `Zoo.search_optimized_models` utility to find models that share a base.

Let us try this on the dense MobileNetV1 to see what is available.
Try this on the dense ResNet-50 to see what is available:

```python
from sparsezoo import Zoo
from sparsezoo.models import classification
print(Zoo.search_optimized_models(classification.mobilenet_v1()))

model = classification.resnet_50()
print(Zoo.search_optimized_models(model))
```

Output:

```shell
[Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/base-none),
Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned-conservative),
Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned-moderate),
Model(stub=cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned_quant-moderate)]
[
Model(stub=cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/base-none),
Model(stub=cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned-conservative),
Model(stub=cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned-moderate),
Model(stub=cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned_quant-moderate),
Model(stub=cv/classification/resnet_v1-50/pytorch/sparseml/imagenet-augmented/pruned_quant-aggressive)
]
```

Great. We can see there are two pruned versions targeting FP32, `conservative` at 100% and `moderate` at >= 99% of baseline accuracy. There is also a `pruned_quant` variant targetting INT8.
We can see there are two pruned versions targeting FP32 and two pruned, quantized versions targeting INT8.
The `conservative`, `moderate`, and `aggressive` tags recover to 100%, >=99%, and <99% of baseline accuracy respectively.

Let's say you want to evaluate best performance on FP32 and are okay with a small drop in accuracy, so we can choose `pruned-moderate` over `pruned-conservative`.
For a version of ResNet-50 that recovers close to the baseline and is very performant, choose the pruned_quant-moderate model.
This model will run [nearly 7x faster](https://neuralmagic.com/blog/benchmark-resnet50-with-deepsparse) than the baseline model on a compatible CPU (with the VNNI instruction set enabled).
For hardware compatibility, see the Hardware Support section.

```python
from deepsparse import compile_model
from sparsezoo.models import classification
batch_size = 64

model = classification.mobilenet_v1(optim_name="pruned", optim_category="moderate")
engine = compile_model(model, batch_size=batch_size)
import numpy

inputs = model.data_inputs.sample_batch(batch_size=batch_size)
outputs, inference_time = engine.timed_run(inputs)
batch_size = 64
sample_inputs = [numpy.random.randn(batch_size, 3, 224, 224).astype(numpy.float32)]

# run baseline benchmarking
engine_base = compile_model(
model="zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/base-none",
batch_size=batch_size,
)
benchmarks_base = engine_base.benchmark(sample_inputs)
print(benchmarks_base)

# run sparse benchmarking
engine_sparse = compile_model(
model="zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned_quant-moderate",
batch_size=batch_size,
)
if not engine_sparse.cpu_vnni:
print("WARNING: VNNI instructions not detected, quantization speedup not well supported")
benchmarks_sparse = engine_sparse.benchmark(sample_inputs)
print(benchmarks_sparse)

print(f"Speedup: {benchmarks_sparse.items_per_second / benchmarks_base.items_per_second:.2f}x")
```

### Quickstart with custom ONNX models
### Quickstart with Custom ONNX Models

We accept ONNX files for custom models, too. Simply plug in your model to compare performance with other solutions.

Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"
html_logo = "icon-engine.png"
html_logo = "icon-deepsparse.png"

html_theme_options = {
'analytics_id': 'UA-128364174-1', # Provided by Google in your dashboard
Expand Down
File renamed without changes
83 changes: 46 additions & 37 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,16 @@
DeepSparse |version|
====================

CPU inference engine that delivers unprecedented performance for sparse models.
Neural network inference engine that delivers GPU-class performance for sparsified models on CPUs

.. raw:: html

<div style="margin-bottom:16px;">
<a href="https://github.com/neuralmagic/deepsparse/blob/main/LICENSE">
<img alt="GitHub" src="https://img.shields.io/github/license/neuralmagic/deepsparse.svg?color=purple&style=for-the-badge" height=25 style="margin-bottom:4px;">
<a href="https://github.com/neuralmagic/deepsparse/blob/main/LICENSE-NEURALMAGIC">
<img alt="GitHub" src="https://img.shields.io/static/v1.svg?label=LICENSE&message=neural%20magic%20engine&color=purple&style=for-the-badge" height=25 style="margin-bottom:4px;">
</a>
<a href="https://github.com/neuralmagic/deepsparse/blob/main/LICENSE">
<img alt="GitHub" src="https://img.shields.io/static/v1.svg?label=LICENSE&message=apache-2.0&color=purple&style=for-the-badge" height=25 style="margin-bottom:4px;">
</a>
<a href="https://docs.neuralmagic.com/deepsparse/index.html">
<img alt="Documentation" src="https://img.shields.io/website/http/docs.neuralmagic.com/deepsparse/index.html.svg?down_color=red&down_message=offline&up_message=online&style=for-the-badge" height=25 style="margin-bottom:4px;">
Expand All @@ -48,54 +51,59 @@ CPU inference engine that delivers unprecedented performance for sparse models.
Overview
========

The DeepSparse Engine is a CPU runtime that delivers unprecedented performance by taking advantage of
natural sparsity within neural networks to reduce compute required as well as accelerate memory bound workloads.
It is focused on model deployment and scaling machine learning pipelines,
fitting seamlessly into your existing deployments as an inference backend.
The DeepSparse Engine is a CPU runtime that delivers GPU-class performance by taking advantage of sparsity within neural networks to reduce compute required as well as accelerate memory bound workloads.
It is focused on model deployment and scaling machine learning pipelines, fitting seamlessly into your existing deployments as an inference backend.

`This repository <https://github.com/neuralmagic/deepsparse />`_ includes package APIs along with examples to quickly get started benchmarking and inferencing sparse models.

Sparsification
==============

Sparsification is the process of taking a trained deep learning model and removing redundant information from the overprecise and over-parameterized network resulting in a faster and smaller model.
Techniques for sparsification are all encompassing including everything from inducing sparsity using `pruning <https://neuralmagic.com/blog/pruning-overview/>`_ and `quantization <https://arxiv.org/abs/1609.07061>`_ to enabling naturally occurring sparsity using `activation sparsity <http://proceedings.mlr.press/v119/kurtz20a.html>`_ or `winograd/FFT <https://arxiv.org/abs/1509.09308>`_.
When implemented correctly, these techniques result in significantly more performant and smaller models with limited to no effect on the baseline metrics.
For example, pruning plus quantization can give noticeable improvements in performance while recovering to nearly the same baseline accuracy.

The Deep Sparse product suite builds on top of sparsification enabling you to easily apply the techniques to your datasets and models using recipe-driven approaches.
Recipes encode the directions for how to sparsify a model into a simple, easily editable format.
- Download a sparsification recipe and sparsified model from the `SparseZoo <https://github.com/neuralmagic/sparsezoo>`_.
- Alternatively, create a recipe for your model using `Sparsify <https://github.com/neuralmagic/sparsify>`_.
- Apply your recipe with only a few lines of code using `SparseML <https://github.com/neuralmagic/sparseml>`_.
- Finally, for GPU-level performance on CPUs, deploy your sparse-quantized model with the `DeepSparse Engine <https://github.com/neuralmagic/deepsparse>`_.


`This GitHub repository <https://github.com/neuralmagic/deepsparse />`_ includes package APIs along with examples to quickly get started learning about and
actually running sparse models.
**Full Deep Sparse product flow:**

<img src="https://docs.neuralmagic.com/docs/source/sparsification/flow-overview.svg" width="960px">

Compatibility
=============

The DeepSparse Engine ingests models in the `ONNX <https://onnx.ai/ />`_ format,
allowing for compatibility with `PyTorch <https://pytorch.org/docs/stable/onnx.html />`_,
`TensorFlow <https://github.com/onnx/tensorflow-onnx />`_, `Keras <https://github.com/onnx/keras-onnx />`_,
and `many other frameworks <https://github.com/onnx/onnxmltools />`_ that support it.
The DeepSparse Engine ingests models in the `ONNX <https://onnx.ai>`_ format,
allowing for compatibility with `PyTorch <https://pytorch.org/docs/stable/onnx.html>`_,
`TensorFlow <https://github.com/onnx/tensorflow-onnx>`_, `Keras <https://github.com/onnx/keras-onnx>`_,
and `many other frameworks <https://github.com/onnx/onnxmltools>`_ that support it.
This reduces the extra work of preparing your trained model for inference to just one step of exporting.

Related Products
================

- `SparseZoo <https://github.com/neuralmagic/sparsezoo />`_:
Neural network model repository for highly sparse models and optimization recipes
- `SparseML <https://github.com/neuralmagic/sparseml />`_:
Libraries for state-of-the-art deep neural network optimization algorithms,
enabling simple pipelines integration with a few lines of code
- `Sparsify <https://github.com/neuralmagic/sparsify />`_:
Easy-to-use autoML interface to optimize deep neural networks for
better inference performance and a smaller footprint

Resources and Learning More
===========================

- `SparseZoo Documentation <https://docs.neuralmagic.com/sparsezoo/ />`_
- `SparseML Documentation <https://docs.neuralmagic.com/sparseml/ />`_
- `Sparsify Documentation <https://docs.neuralmagic.com/sparsify/ />`_
- `Neural Magic Blog <https://www.neuralmagic.com/blog/ />`_,
`Resources <https://www.neuralmagic.com/resources/ />`_,
`Website <https://www.neuralmagic.com/ />`_
- `SparseZoo Documentation <https://docs.neuralmagic.com/sparsezoo>`_
- `SparseML Documentation <https://docs.neuralmagic.com/sparseml>`_
- `Sparsify Documentation <https://docs.neuralmagic.com/sparsify>`_
- `Neural Magic Blog <https://www.neuralmagic.com/blog>`_,
`Resources <https://www.neuralmagic.com/resources>`_,
`Website <https://www.neuralmagic.com>`_

Release History
===============

Official builds are hosted on PyPi
- stable: `deepsparse <https://pypi.org/project/deepsparse/ />`_
- nightly (dev): `deepsparse-nightly <https://pypi.org/project/deepsparse-nightly/ />`_
- stable: `deepsparse <https://pypi.org/project/deepsparse>`_
- nightly (dev): `deepsparse-nightly <https://pypi.org/project/deepsparse-nightly>`_

Additionally, more information can be found via
`GitHub Releases <https://github.com/neuralmagic/deepsparse/releases />`_.
`GitHub Releases <https://github.com/neuralmagic/deepsparse/releases>`_.

.. toctree::
:maxdepth: 3
Expand All @@ -118,8 +126,9 @@ Additionally, more information can be found via
api/deepsparse

.. toctree::
:maxdepth: 2
:caption: Help and Support
:maxdepth: 3
:caption: Help

Bugs, Feature Requests <https://github.com/neuralmagic/deepsparse/issues>
Support, General Q&A <https://github.com/neuralmagic/deepsparse/discussions>
Support, General Q&A <https://github.com/neuralmagic/deepsparse/discussions>
Neural Magic Docs <https://docs.neuralmagic.com>
Loading

0 comments on commit 0b35b2c

Please sign in to comment.