Skip to content

Commit

Permalink
#0: Update llama3 perf.md
Browse files Browse the repository at this point in the history
  • Loading branch information
mtairum committed Jan 29, 2025
1 parent e9cc7d7 commit 27beb0c
Showing 1 changed file with 92 additions and 42 deletions.
134 changes: 92 additions & 42 deletions models/demos/llama3/PERF.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,51 +4,101 @@ Performance collected from [demo/demo.py](demo/demo.py) and accuracy collected f

Note that `test_llama_accuracy.py` parses the below to determine expected values +- 0.5.

Also note that all the performance metrics below were taken for a maximum generation of 200 tokens, i.e., 200 decode iterations.

## LlamaOptimizations.performance

This configuration uses bfp4 MLP FF1+FF3 for all models.

| Model | Device | Top-1 (%) | Top-5 (%) | Speed (t/s/u) |
|-------|--------|-----------|-----------|---------------|
| 1b | N150 | 90 | 98 | 91.0 |
| 1b | N300 | 90 | 98 | 98.8 |
| 1b | T3K | 88 | 98 | 97.8 |
| 1b | TG | 88 | 99 | 51.0 |
| 3b | N150 | 91 | 98 | 49.2 |
| 3b | N300 | 90 | 98 | 56.8 |
| 3b | T3K | 91 | 98 | 54.5 |
| 3b | TG | 90 | 97 | 33.5 |
| 8b | N150 | 88 | 99 | 28.6 |
| 8b | N300 | 85 | 98 | 38.9 |
| 8b | T3K | 84 | 97 | 53.7 |
| 8b | TG | 86 | 98 | 29.5 |
| 11b | N300 | 87 | 98 | 38.6 |
| 11b | T3K | 88 | 98 | 52.6 |
| 11b | TG | 86 | 98 | 29.5 |
| 70b | T3K | 95 | 99 | 14.7 |
| 70b | TG | 95 | 100 | 12.7 |
This configuration uses bfp4 MLP FF1+FF3 for all models. **Batch_size=1 and prefill_length is 128 tokens.**

| Model | Device | Top-1 (%) | Top-5 (%) | Speed (t/s/u) | TTFT (ms) |
|-------|--------|-----------|-----------|---------------|-----------|
| 1b | N150 | 88 | 98 | 95.7 | 41 |
| 1b | N300 | 90 | 98 | 113.3 | 48 |
| 1b | T3K | 89 | 98 | 108.4 | 45 |
| 1b | TG | 88 | 99 | 51.0 | |
| 3b | N150 | 91 | 98 | 52.5 | 70 |
| 3b | N300 | 91 | 98 | 64.8 | 57 |
| 3b | T3K | 89 | 98 | 64.0 | 54 |
| 3b | TG | 90 | 97 | 33.5 | |
| 8b | N150 | 85 | 98 | 30.1 | 120 |
| 8b | N300 | 86 | 98 | 43.1 | 84 |
| 8b | T3K | 86 | 98 | 61.8 | 64 |
| 8b | TG | 86 | 98 | 29.5 | |
| 11b | N300 | 87 | 99 | 43.3 | 94 |
| 11b | T3K | 86 | 99 | 58.8 | 71 |
| 11b | TG | 86 | 98 | 29.5 | |
| 70b | T3K | 95 | 99 | 16.2 | 180 |
| 70b | TG | 95 | 100 | 12.7 | |


## LlamaOptimizations.accuracy

This configuration uses bfp4 MLP FF1+FF3 only for the 3.1-70B model.

| Model | Device | Top-1 (%) | Top-5 (%) | Speed (t/s/u) |
|-------|--------|-----------|-----------|---------------|
| 1b | N150 | 89 | 98 | 86.8 |
| 1b | N300 | 88 | 100 | 98.1 |
| 1b | T3K | 90 | 99 | 97.5 |
| 1b | TG | 87 | 98 | 51.3 |
| 3b | N150 | 92 | 100 | 44.2 |
| 3b | N300 | 92 | 99 | 54.2 |
| 3b | T3K | 90 | 99 | 55.6 |
| 3b | TG | 91 | 98 | 33.6 |
| 8b | N150 | 91 | 99 | 23.6 |
| 8b | N300 | 92 | 99 | 34.5 |
| 8b | T3K | 91 | 99 | 49.8 |
| 8b | TG | 88 | 100 | 29.5 |
| 11b | N300 | 91 | 99 | 33.8 |
| 11b | T3K | 91 | 99 | 52.6 |
| 11b | TG | 88 | 100 | 29.5 |
| 70b | T3K | 95 | 99 | 14.7 |
| 70b | TG | 95 | 100 | 12.7 |
This configuration uses bfp4 MLP FF1+FF3 only for the 3.1-70B model. **Batch_size=1 and prefill_length is 128 tokens.**

| Model | Device | Top-1 (%) | Top-5 (%) | Speed (t/s/u) | TTFT (ms) |
|-------|--------|-----------|-----------|---------------|-----------|
| 1b | N150 | 88 | 99 | 91.3 | 43 |
| 1b | N300 | 89 | 100 | 109.5 | 52 |
| 1b | T3K | 91 | 99 | 108.9 | 50 |
| 1b | TG | 87 | 98 | 51.3 | |
| 3b | N150 | 91 | 99 | 47.0 | 78 |
| 3b | N300 | 92 | 100 | 61.4 | 72 |
| 3b | T3K | 92 | 100 | 63.6 | 58 |
| 3b | TG | 91 | 98 | 33.6 | |
| 8b | N150 | 91 | 99 | 24.6 | 140 |
| 8b | N300 | 89 | 98 | 37.8 | 95 |
| 8b | T3K | 91 | 99 | 58.2 | 60 |
| 8b | TG | 88 | 100 | 29.5 | |
| 11b | N300 | 91 | 99 | 37.7 | 95 |
| 11b | T3K | 89 | 98 | 58.0 | 63 |
| 11b | TG | 88 | 100 | 29.5 | |
| 70b | T3K | 95 | 100 | 16.2 | 183 |
| 70b | TG | 95 | 100 | 12.7 | |

## Llama long-context

This configuration uses bfp4 MLP FF1+FF3 for all models. **Batch_size=1 and prefill_length is 64k tokens.**

| Model | Device | Speed (t/s/u) | TTFT (ms) |
|-------|--------|---------------|-----------|
| 1b | N150 | 59.2 | 20284 |
| 1b | N300 | 73.8 | 11049 |
| 1b | T3K | 75.3 | 5350 |
| 1b | TG | | |
| 3b | N150 | - | - |
| 3b | N300 | 36.0 | 23249 |
| 3b | T3K | 42.3 | 10917 |
| 3b | TG | | |
| 8b | N150 | - | - |
| 8b | N300 | 26.7 | 36612 |
| 8b | T3K | 39.7 | 16552 |
| 8b | TG | | |
| 11b | N300 | 26.7 | 36626 |
| 11b | T3K | 39.8 | 16541 |
| 11b | TG | | |
| 70b | T3K | 12.0 | 75290 |
| 70b | TG | | |

## Llama 32-users

This configuration uses bfp4 MLP FF1+FF3 for all models. **Batch_size=32 and prefill_length is 128 tokens.**

| Model | Device | Speed (t/s/u) | avg TTFT (ms) |
|-------|--------|---------------|---------------|
| 1b | N150 | 62.0 | 57 |
| 1b | N300 | 72.2 | 51 |
| 1b | T3K | 73.6 | 55 |
| 1b | TG | | |
| 3b | N150 | 37.9 | 87 |
| 3b | N300 | 47.4 | 72 |
| 3b | T3K | 49.3 | 75 |
| 3b | TG | | |
| 8b | N150 | 24.3 | 137 |
| 8b | N300 | 34.3 | 105 |
| 8b | T3K | 47.9 | 74 |
| 8b | TG | | |
| 11b | N300 | 34.5 | 98 |
| 11b | T3K | 47.6 | 80 |
| 11b | TG | | |
| 70b | T3K | 14.9 | 203 |
| 70b | TG | | |

0 comments on commit 27beb0c

Please sign in to comment.