diff --git a/examples/huggingface-transformers/README.md b/examples/huggingface-transformers/README.md index b3414a7499..8f1850a438 100644 --- a/examples/huggingface-transformers/README.md +++ b/examples/huggingface-transformers/README.md @@ -38,7 +38,7 @@ Question-Answering task. The current version of the pipeline supports only from pipelines import pipeline # SparseZoo model stub or path to ONNX file -onnx_filepath="zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-moderate" +onnx_filepath="zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_98" num_cores=None # uses all available CPU cores by default @@ -70,7 +70,7 @@ benchmark.py -h`. To run a benchmark using the DeepSparse Engine with a pruned BERT model that uses all available CPU cores and batch size 1, run: ```bash python benchmark.py \ - zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-moderate \ + zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_98 \ --batch-size 1 ``` @@ -94,7 +94,7 @@ also supported. Example command: ```bash python server.py \ - zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-moderate + zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_98 ``` You can leave that running as a detached process or in a spare terminal. @@ -142,10 +142,8 @@ Learn more at | Model Name | Stub | Description | |----------|-------------|-------------| -| bert-pruned-moderate | zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-moderate |This model is the result of pruning BERT base uncased on the SQuAD dataset. The sparsity level is 90% uniformly applied to all encoder layers. Distillation was used with the teacher being the BERT model fine-tuned on the dataset for two epochs.| -| bert-6layers-aggressive-pruned| zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_6layers-aggressive_96 |This model is the result of pruning a modified BERT base uncased with 6 layers on the SQuAD dataset. The sparsity level is 95% uniformly applied to all encoder layers. Distillation was used with the teacher being the BERT model fine-tuned on the dataset for two epochs.| +| bert-6layers-aggressive-pruned-96| zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_6layers-aggressive_96 |This model is the result of pruning a modified BERT base uncased with 6 layers on the SQuAD dataset. The sparsity level is 95% uniformly applied to all encoder layers. Distillation was used with the teacher being the BERT model fine-tuned on the dataset for two epochs.| | bert-pruned-conservative| zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-conservative |This model is the result of pruning BERT base uncased on the SQuAD dataset. The sparsity level is 80% uniformly applied to all encoder layers. Distillation was used with the teacher being the BERT model fine-tuned on the dataset for two epochs.| -| pruned_6layers-moderate | zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_6layers-moderate |This model is the result of pruning a modified BERT base uncased with 6 layers on the SQuAD dataset. The sparsity level is 90% uniformly applied to all encoder layers. Distillation was used with the teacher being the BERT model fine-tuned on the dataset for two epochs. The integration with Hugging Face's Transformers can be found [here](https://github.com/neuralmagic/sparseml/tree/main/integrations/huggingface-transformers).| -| pruned-aggressive_94 | zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_94|This model is the result of pruning BERT base uncased on the SQuAD dataset. The sparsity level is 95% uniformly applied to all encoder layers. Distillation was used with the teacher being the BERT model fine-tuned on the dataset for two epochs.| -| pruned_6layers-conservative| zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_6layers-conservative|This model is the result of pruning a modified BERT base uncased with 6 layers on the SQuAD dataset. The sparsity level is 80% uniformly applied to all encoder layers. Distillation was used with the teacher being the BERT model fine-tuned on the dataset for two epochs.| -| bert-base|zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none |This model is the result of a BERT base uncased model fine-tuned on the SQuAD dataset for two epochs.| +| pruned-aggressive_94 | zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_94|This model is the result of pruning a modified BERT base uncased with 6 layers on the SQuAD dataset. The sparsity level is 90% uniformly applied to all encoder layers. Distillation was used with the teacher being the BERT model fine-tuned on the dataset for two epochs.| +| bert-3layers-pruned-aggressive-89| zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_3layers-aggressive_89|This model is the result of pruning a modified BERT base uncased with 6 layers on the SQuAD dataset. The sparsity level is 89% uniformly applied to all encoder layers. Distillation was used with the teacher being the BERT model fine-tuned on the dataset for two epochs.| +| bert-base|zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/base-none |This model is the result of a BERT base uncased model fine-tuned on the SQuAD dataset for two epochs.| \ No newline at end of file diff --git a/examples/huggingface-transformers/benchmark.py b/examples/huggingface-transformers/benchmark.py index aa260a7a31..6a4b1c368e 100644 --- a/examples/huggingface-transformers/benchmark.py +++ b/examples/huggingface-transformers/benchmark.py @@ -66,7 +66,7 @@ ########## Example for benchmarking on a pruned BERT model from sparsezoo with deepsparse: python benchmark.py \ - zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-moderate \ + zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_98 \ ########## Example for benchmarking on a local ONNX model with deepsparse: diff --git a/examples/huggingface-transformers/server.py b/examples/huggingface-transformers/server.py index d1026636ff..5b87693ca3 100644 --- a/examples/huggingface-transformers/server.py +++ b/examples/huggingface-transformers/server.py @@ -38,7 +38,7 @@ ########## Example command for running using a model from sparsezoo: python server.py \ - zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-moderate + zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_98 """ import argparse import json diff --git a/examples/huggingface-transformers/squad_eval.py b/examples/huggingface-transformers/squad_eval.py index 661f7c4c87..f5005ed5dd 100644 --- a/examples/huggingface-transformers/squad_eval.py +++ b/examples/huggingface-transformers/squad_eval.py @@ -48,7 +48,7 @@ ########## Example command for evaluating a sparse BERT QA model from sparsezoo: python squad_eval.py \ - zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-moderate + zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_98 """ diff --git a/examples/huggingface-transformers/squad_inference.py b/examples/huggingface-transformers/squad_inference.py index 85a55c9be7..d6f23a6536 100644 --- a/examples/huggingface-transformers/squad_inference.py +++ b/examples/huggingface-transformers/squad_inference.py @@ -60,7 +60,7 @@ ########## Example command for running 1000 samples using a model from sparsezoo: python squad_inference.py \ - zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-moderate \ + zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned-aggressive_98 \ --num-samples 1000 """