Refactor of perplexity computation #1197

anmarques · 2023-08-23T17:59:24Z

Refactor intended to simplify perplexity computation and add support for different datasets into the same codebase. Among the changes, these are the highlights:

The perplexity class is now agnostic of the pipeline. It only sees predictions and targets. It does have an argument called "accumulate" that indicates whether to compute ppl for each sample separately or accumulate across samples.
Handle masking of tokens and logits in the perplexity_eval function. To avoid complications regarding the attention mask not being uniform within a batch, process each sample separately even if the pipeline is executed in batched mode.
Add logic to split data for wikitext such that each sample has the same number of tokens.

Testing plan:

Verified ppl for base Codegen 350M mono:

deepsparse.transformers.eval_downstream --batch-size 16 --max-sequence-length 1024 -d openai_humaneval
Result: mean ppl: 3.60 (PyTorch: 3.60)

Verified ppl for OPT base 1.3b:

deepsparse.transformers.eval_downstream --batch-size 16 --max-sequence-length 2048 -d wikitext
Result: mean ppl: 14.62 (PyTorch: 14.63)

NOTE: This pipeline was only tested for non-cached models. It should work with kv-cache models as well. Right now the pipeline is created with sequence_length=args.max_sequence_length and prompt_processing_sequence_length=args.max_sequence_length. As soon as the kv-cache issues around this case are resolved we should test ppl evaluation again.

Update: Added support to c4 dataset in a way that complies with both the subsets defined in SparseGPT and LLM-foundry. Validated on cached models as well.

…specific processing is handled elsewhere

…1 only, so no need to consider batched execution. In addition, use input_tokens from generation pipeline

…ce length

…ly reduced memory requirements

src/deepsparse/transformers/eval_downstream.py

src/deepsparse/transformers/metrics.py

…hed models)

dsikka

looks a lot better. still need to verify testing cases. Could you point to where those files are? Don't seem to be a part of this PR.

tests/deepsparse/transformers/pipelines/test_text_generation.py

* Add input_tokes as optional output * Refactor Perplexity class to only compute perplexity. All other task-specific processing is handled elsewhere * Simplify perplexity evaluation. Evaluation takes place as batch size 1 only, so no need to consider batched execution. In addition, use input_tokens from generation pipeline * Splits wikitext at regular intervals of the same length as the sequence length * Add argument for accumulation of negative log likelihood * Accumulate likelihood for wikitext * Simplification * Add support for wikitext-style ppl evaluation * Compute batch instead of storing until compute method. This drastically reduced memory requirements * Remove torch dependency * Move split of dataset into helper function * Quality fixes * Remove debugging prints * Remove debugging prints * Incorporate fixes for kv-cache * Include doc string for accumulate * Add support to trust-remote-code arguments * Add support to c4 * add a missing include_prompt_logits param * Remove unnecessary capping at sequence length (it's incorrect for cached models) * Simplify processing for concatenated datasets * Fix kv cache update * Fix kv cache update * Quality fixes * remove batch size from pipeline instantiation * Rename to wikitext2 * Remove trust_remote_code argument * Remove use_deepsparse_cache argument * Change padding of output to left in order to match padding of input ids and attention mask * Allow trust_remote_code to be passed as argument (in some cases tokenizer can be defined by custom code) * Move process_concatenated_datasets to helpers file * Added support for max_text_length to speed up processing of long datasets * Rebase w/ main * Rebase w/ main * Fix typo * Rebase * Use max_length instead of max_new_tokens * Rebase * Added typing and docstring * Added typing and docstring * Define concantenated datasets * Add warning about batch-size not being a supported argument for some datasets * Add unit test for pipeline and generation in ppl eval * Add lifecycle in docstring * Add copyright * Style fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Quality fixes * Rebase * Rebase * Re-add unit test * Style fix * Update unit test * Update unit test --------- Co-authored-by: dbogunowicz <97082108+dbogunowicz@users.noreply.github.com> Co-authored-by: Damian <damian@neuralmagic.com> Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>

anmarques added 9 commits August 22, 2023 00:33

Add input_tokes as optional output

11ca2b5

Refactor Perplexity class to only compute perplexity. All other task-…

530d625

…specific processing is handled elsewhere

Simplify perplexity evaluation. Evaluation takes place as batch size …

c816922

…1 only, so no need to consider batched execution. In addition, use input_tokens from generation pipeline

Splits wikitext at regular intervals of the same length as the sequen…

5c89d89

…ce length

Add argument for accumulation of negative log likelihood

5767ca0

Accumulate likelihood for wikitext

ec2162e

Simplification

a7941ef

Add support for wikitext-style ppl evaluation

3ddd45c

Compute batch instead of storing until compute method. This drastical…

756169c

…ly reduced memory requirements

anmarques requested review from shubhra, natuan, bfineran and dbogunowicz August 23, 2023 17:59

bfineran previously approved these changes Aug 23, 2023

View reviewed changes

src/deepsparse/transformers/eval_downstream.py Outdated Show resolved Hide resolved

Remove torch dependency

97b5f1a

anmarques dismissed bfineran’s stale review via 97b5f1a August 23, 2023 20:36

anmarques added 3 commits August 23, 2023 16:50

Move split of dataset into helper function

91b5921

Merge branch 'main' into research/ppl_refactor

e7c9342

Quality fixes

8ef20e7

dbogunowicz reviewed Aug 24, 2023

View reviewed changes

src/deepsparse/transformers/eval_downstream.py Outdated Show resolved Hide resolved

src/deepsparse/transformers/metrics.py Show resolved Hide resolved

anmarques and others added 10 commits August 24, 2023 11:02

Remove debugging prints

5a60228

Remove debugging prints

2559e41

Incorporate fixes for kv-cache

3b7e14b

Include doc string for accumulate

b5f845b

Add support to trust-remote-code arguments

6f3b246

Merge branch 'main' into research/ppl_refactor

42f0da2

Add support to c4

2056ec5

Merge branch 'main' into research/ppl_refactor

8f15636

add a missing include_prompt_logits param

858bee6

Remove unnecessary capping at sequence length (it's incorrect for cac…

4f6eb6b

…hed models)

anmarques requested review from dsikka and rahul-tuli October 20, 2023 20:45

anmarques added 2 commits October 23, 2023 10:15

Merge branch 'main' into research/ppl_refactor

a329e25

Merge branch 'main' into research/ppl_refactor

fe4b267

bfineran previously approved these changes Oct 23, 2023

View reviewed changes

Merge branch 'main' into research/ppl_refactor

c55a05e

anmarques dismissed bfineran’s stale review via c55a05e October 24, 2023 16:59

anmarques and others added 5 commits October 25, 2023 23:04

Merge branch 'main' into research/ppl_refactor

5d46ddf

Merge branch 'main' into research/ppl_refactor

bf139c2

Rebase

e6e7828

Rebase

d7c6e5a

Merge branch 'main' into research/ppl_refactor

f8c64a1

dsikka reviewed Nov 1, 2023

View reviewed changes

bfineran previously approved these changes Nov 1, 2023

View reviewed changes

dbogunowicz previously approved these changes Nov 7, 2023

View reviewed changes

anmarques added 2 commits November 8, 2023 10:22

Merge branch 'main' into research/ppl_refactor

28919b1

Re-add unit test

21c6f0d

anmarques dismissed stale reviews from dbogunowicz and bfineran via 21c6f0d November 8, 2023 15:26

Style fix

fa0cb4b

dsikka reviewed Nov 8, 2023

View reviewed changes

tests/deepsparse/transformers/pipelines/test_text_generation.py Outdated Show resolved Hide resolved

anmarques added 2 commits November 8, 2023 15:19

Update unit test

bf1b0cf

Update unit test

0c618a6

anmarques requested a review from dsikka November 8, 2023 22:31

dsikka approved these changes Nov 10, 2023

View reviewed changes

dbogunowicz approved these changes Nov 10, 2023

View reviewed changes

anmarques merged commit 86490b0 into main Nov 10, 2023
13 checks passed

anmarques deleted the research/ppl_refactor branch November 10, 2023 15:45

dbogunowicz mentioned this pull request Nov 13, 2023

[Cherry Pick] Refactor of perplexity computation #1399

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor of perplexity computation #1197

Refactor of perplexity computation #1197

anmarques commented Aug 23, 2023 •

edited

Loading

dsikka left a comment

Refactor of perplexity computation #1197

Refactor of perplexity computation #1197

Conversation

anmarques commented Aug 23, 2023 • edited Loading

dsikka left a comment

Choose a reason for hiding this comment

anmarques commented Aug 23, 2023 •

edited

Loading