-
- DeepSparse
+
+
+ DeepSparse
Sparsity-aware deep learning inference runtime for CPUs
From cfe1baaa24e6154b9c8162f6bbce71da91c02079 Mon Sep 17 00:00:00 2001
From: Michael Goin
Date: Fri, 12 Apr 2024 12:18:57 -0600
Subject: [PATCH 4/5] Update README.md
---
src/deepsparse/transformers/README.md | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/src/deepsparse/transformers/README.md b/src/deepsparse/transformers/README.md
index f1e38aee85..9630086df9 100644
--- a/src/deepsparse/transformers/README.md
+++ b/src/deepsparse/transformers/README.md
@@ -151,9 +151,12 @@ https://sparsezoo.neuralmagic.com/?useCase=text_generation)
```python
from deepsparse import Pipeline
-opt_pipeline = Pipeline.create(task="opt", model_path="zoo:opt-1.3b-opt_pretrain-quantW8A8")
+llama_pipeline = Pipeline.create(
+ task="text-generation",
+ model_path="zoo:llama2-7b-ultrachat200k_llama2_pretrain-pruned50_quantized"
+)
-inference = opt_pipeline("Who is the president of the United States?")
+inference = llama_pipeline("Who is the president of the United States?")
>> 'The president of the United States is the head of the executive branch of government...'
```
@@ -163,7 +166,7 @@ Spinning up:
```bash
deepsparse.server \
--task text-generation \
- --model_path zoo:opt-1.3b-opt_pretrain-pruned50_quantW8A8
+ --model_path zoo:llama2-7b-ultrachat200k_llama2_pretrain-pruned50_quantized
```
Making a request:
From 1ff44cddd797625ae9da44b2af0fadb7a7ace269 Mon Sep 17 00:00:00 2001
From: Michael Goin
Date: Tue, 16 Apr 2024 14:33:58 -0400
Subject: [PATCH 5/5] Update README.md
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 59699a945a..e19ef6c39d 100644
--- a/README.md
+++ b/README.md
@@ -74,7 +74,7 @@ print(pipeline(prompt, max_new_tokens=75).generations[0].text)
# Sparsity is the property of a matrix or other data structure in which a large number of elements are zero and a smaller number of elements are non-zero. In the context of machine learning, sparsity can be used to improve the efficiency of training and prediction.
```
-> [Check out the `TextGeneration` documentation for usage details.](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md)
+Check out the [`TextGeneration` documentation for usage details](https://github.com/neuralmagic/deepsparse/blob/main/docs/llms/text-generation-pipeline.md) and get the [latest sparsified LLMs on our HF Collection](https://huggingface.co/collections/neuralmagic/deepsparse-sparse-llms-659d61e81774dd48343642bf).
### Sparsity :handshake: Performance