Skip to content

feat: add WMDP dataset integration #93

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,17 @@

## 📖 Overview

We provide efficient and streamlined implementations of the TOFU, MUSE unlearning benchmarks while supporting 6 unlearning methods, 3+ datasets, 9+ evaluation metrics, and 6+ LLM architectures. Each of these can be easily extended to incorporate more variants.
We provide efficient and streamlined implementations of the TOFU, MUSE, and WMDP unlearning benchmarks while supporting 6 unlearning methods, 3+ datasets, 9+ evaluation metrics, and 6+ LLM architectures. Each of these can be easily extended to incorporate more variants.

We invite the LLM unlearning community to collaborate by adding new benchmarks, unlearning methods, datasets and evaluation metrics here to expand OpenUnlearning's features, gain feedback from wider usage and drive progress in the field.

---

### 📢 Updates

#### [Apr 9, 2025]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update just before merge

- **New Benchmark!** Added support for the [WMDP](https://arxiv.org/abs/2403.03218) (Weapons of Mass Destruction Proxy) benchmark with cyber-forget-corpus and cyber-retain-corpus for unlearning hazardous knowledge.

#### [Apr 6, 2025]
⚠️⚠️ **IMPORTANT:** Be sure to run `python setup_data.py` immediately after merging the latest version. This is required to refresh the downloaded eval log files and ensure they're compatible with the latest evaluation metrics.
- **More Metrics!** Added 6 Membership Inference Attacks (MIA) (LOSS, ZLib, Reference, GradNorm, MinK, and MinK++), along with Extraction Strength (ES) and Exact Memorization (EM) as additional evaluation metrics.
Expand Down Expand Up @@ -59,10 +62,10 @@ We provide several variants for each of the components in the unlearning pipelin

| **Component** | **Available Options** |
|------------------------|----------------------|
| **Benchmarks** | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/) |
| **Benchmarks** | [TOFU](https://arxiv.org/abs/2401.06121), [MUSE](https://muse-bench.github.io/), [WMDP](https://arxiv.org/abs/2403.03218) |
| **Unlearning Methods** | GradAscent, GradDiff, NPO, SimNPO, DPO, RMU |
| **Evaluation Metrics** | Verbatim Probability, Verbatim ROUGE, Knowledge QA-ROUGE, Model Utility, Forget Quality, TruthRatio, Extraction Strength, Exact Memorization, 6 MIA attacks |
| **Datasets** | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits) |
| **Datasets** | MUSE-News (BBC), MUSE-Books (Harry Potter), TOFU (different splits), WMDP (cyber corpus) |
| **Model Families** | TOFU: LLaMA-3.2, LLaMA-3.1, LLaMA-2; MUSE: LLaMA-2; Additional: Phi-3.5, Phi-1.5, Gemma |

---
Expand Down Expand Up @@ -155,6 +158,7 @@ The scripts below execute standard baseline unlearning experiments on the TOFU a
```bash
bash scripts/tofu_unlearn.sh
bash scripts/muse_unlearn.sh
bash scripts/wmdp_unlearn.sh
```

The above scripts are not tuned and uses default hyper parameter settings. We encourage you to tune your methods and add your final results in [`community/leaderboard.md`](community/leaderboard.md).
Expand Down
11 changes: 11 additions & 0 deletions configs/data/datasets/WMDP_cyber_forget.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
WMDP_cyber_forget:
handler: QADataset
args:
hf_args:
path: "cais/wmdp"
name: "wmdp-cyber"
split: "test"
question_key: "question"
answer_key: "answer"
choices_key: "choices"
max_length: 512
9 changes: 9 additions & 0 deletions configs/data/datasets/WMDP_cyber_forget_corpus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
WMDP_cyber_forget_corpus:
handler: PretrainingDataset
args:
hf_args:
path: "cais/wmdp-corpora"
name: "cyber-forget-corpus"
split: "train"
text_key: "text"
max_length: 2048
11 changes: 11 additions & 0 deletions configs/data/datasets/WMDP_cyber_retain.yaml
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same as WMDP_cyber_forget.yaml
Is it needed for something? If for testing performance disruption, IIRC MMLU was used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ruidazeng I don't see the path existing for this dataset in hf.
cais/wmdp-corpora

Copy link

@filyp filyp Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean for MMLU? Yeah, it's not in cais/wmdp-corpora but in https://huggingface.co/datasets/cais/mmlu

From what I understand, in original setup, training is done on:

  • "cais/wmdp-corpora" - "cyber-forget-corpus"
  • "cais/wmdp-corpora" - "cyber-retain-corpus"

And evaluation on:

  • "cais/wmdp" - "wmdp-cyber"
  • "cais/mmlu" - "all"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you mentioned in #80 that evaluation won't be done here anyway, but only afterwards, so I guess WMDP_cyber_forget.yaml and WMDP_cyber_retain.yaml just aren't needed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
WMDP_cyber_retain:
handler: QADataset
args:
hf_args:
path: "cais/wmdp"
name: "wmdp-cyber"
split: "test"
question_key: "question"
answer_key: "answer"
choices_key: "choices"
max_length: 512
9 changes: 9 additions & 0 deletions configs/data/datasets/WMDP_cyber_retain_corpus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
WMDP_cyber_retain_corpus:
handler: PretrainingDataset
args:
hf_args:
path: "cais/wmdp-corpora"
name: "cyber-retain-corpus"
split: "train"
text_key: "text"
max_length: 2048
20 changes: 20 additions & 0 deletions configs/eval/wmdp.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# @package eval.wmdp
# NOTE: the above line is not a comment, but sets the package for config. See https://hydra.cc/docs/upgrades/0.11_to_1.0/adding_a_package_directive/

defaults: # include all defined metrics files
- tofu_metrics: # Reusing TOFU metrics for WMDP evaluation
- forget_quality
- forget_Q_A_Prob
- forget_Q_A_ROUGE
- model_utility
- privleak
- extraction_strength

handler: WMDPEvaluator
output_dir: ${paths.output_dir} # set to default eval directory
metrics: {} # lists a mapping from each evaluation metric to its config
# populated through the first (@package) line in each metric config
overwrite: false
forget_split: test
holdout_split: test
retain_logs_path: null
49 changes: 49 additions & 0 deletions configs/experiment/unlearn/wmdp/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# WMDP Unlearning Configuration

defaults:
- override /model: Llama-3.2-1B-Instruct
- override /trainer: RMU
- override /data: unlearn
- override /data/datasets@data.forget: WMDP_cyber_forget_corpus
- override /data/datasets@data.retain: WMDP_cyber_retain_corpus
- override /eval: wmdp

model:
model_args:
pretrained_model_name_or_path: meta-llama/Llama-3.2-1B-Instruct

forget_split: train
retain_split: train
holdout_split: test
retain_logs_path: null

eval:
wmdp:
forget_split: ${forget_split}
holdout_split: ${holdout_split}
retain_logs_path: ${retain_logs_path}
overwrite: true

data:
anchor: forget
forget:
WMDP_cyber_forget_corpus:
args:
hf_args:
name: "cyber-forget-corpus"
split: ${forget_split}
retain:
WMDP_cyber_retain_corpus:
args:
hf_args:
name: "cyber-retain-corpus"
split: ${retain_split}

trainer:
args:
warmup_epochs: 1.0
learning_rate: 1e-5
weight_decay: 0.01
num_train_epochs: 10

task_name: wmdp_cyber_unlearning
31 changes: 31 additions & 0 deletions docs/evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,17 @@ python src/eval.py --config-name=eval.yaml \
- `---config-name=eval.yaml`- this is set by default so can be omitted
- `data_split=Books`- overrides the default MUSE data split (News). See [`configs/experiment/eval/muse/default.yaml`](../configs/experiment/eval/muse/default.yaml)

Run the WMDP benchmark evaluation on a checkpoint of a LLaMA 3.2 model:
```bash
python src/eval.py --config-name=eval.yaml \
experiment=eval/wmdp/default \
model=Llama-3.2-1B-Instruct \
model.model_args.pretrained_model_name_or_path=<LOCAL_MODEL_PATH> \
task_name=WMDP_EVAL
```
- `experiment=eval/wmdp/default`- set experiment to use [`configs/eval/wmdp/default.yaml`](../configs/eval/wmdp/default.yaml)
- The WMDP evaluation uses the cyber corpus by default and evaluates the model's performance on hazardous knowledge unlearning

## Metrics

A metric takes a model and a dataset and computes statistics of the model over the datapoints (or) takes other metrics and computes an aggregated score over the dataset.
Expand Down Expand Up @@ -240,3 +251,23 @@ metrics: {} # lists a mapping from each evaluation metric listed above to its co
output_dir: ${paths.output_dir} # set to default eval directory
forget_split: forget10
```

Example: WMDP evaluator config file ([`configs/eval/wmdp.yaml`](../configs/eval/wmdp.yaml))

```yaml
# @package eval.wmdp
defaults: # include all the metrics that come under the WMDP evaluator
- tofu_metrics: # WMDP reuses many of the same metrics as TOFU
- forget_quality
- forget_Q_A_Prob
- forget_Q_A_ROUGE
- model_utility
- privleak
- extraction_strength

handler: WMDPEvaluator
metrics: {} # lists a mapping from each evaluation metric listed above to its config
output_dir: ${paths.output_dir} # set to default eval directory
forget_split: test
holdout_split: test
```
4 changes: 3 additions & 1 deletion docs/links.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Links to research papers and resources corresponding to implemented features in
|-----------|----------|
| TOFU | Paper [📄](https://arxiv.org/abs/2401.06121) |
| MUSE | Paper [📄](https://arxiv.org/abs/2407.06460) |
| WMDP | Paper [📄](https://arxiv.org/abs/2403.03218), Code [🐙](https://github.com/centerforaisafety/wmdp), Website [🌐](https://www.wmdp.ai/) |

---

Expand All @@ -57,6 +58,7 @@ Links to research papers and resources corresponding to implemented features in
### 🐙 Other GitHub Repositories
- [TOFU Benchmark (original)](https://github.com/locuslab/tofu)
- [MUSE Benchmark (original)](https://github.com/swj0419/muse_bench)
- [WMDP Benchmark (original)](https://github.com/centerforaisafety/wmdp)
- [Awesome LLM Unlearning](https://github.com/chrisliu298/awesome-llm-unlearning)
- [Awesome Machine Unlearning](https://github.com/tamlhp/awesome-machine-unlearning)
- [Awesome GenAI Unlearning](https://github.com/franciscoliu/Awesome-GenAI-Unlearning)
- [Awesome GenAI Unlearning](https://github.com/franciscoliu/Awesome-GenAI-Unlearning)
2 changes: 2 additions & 0 deletions src/evals/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
from omegaconf import DictConfig
from evals.tofu import TOFUEvaluator
from evals.muse import MUSEEvaluator
from evals.wmdp import WMDPEvaluator

EVALUATOR_REGISTRY: Dict[str, Any] = {}

Expand Down Expand Up @@ -31,3 +32,4 @@ def get_evaluators(eval_cfgs: DictConfig, **kwargs):
# Register Your benchmark evaluators
_register_evaluator(TOFUEvaluator)
_register_evaluator(MUSEEvaluator)
_register_evaluator(WMDPEvaluator)
6 changes: 6 additions & 0 deletions src/evals/wmdp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from evals.base import Evaluator


class WMDPEvaluator(Evaluator):
def __init__(self, eval_cfg, **kwargs):
super().__init__("WMDP", eval_cfg, **kwargs)