Intervention Composability Evaluation Harness

Overview

Test-time interventions for language models aim to enhance factual accuracy, reduce harmful outputs, and improve model efficiency, all while avoiding excessive training costs. However, existing interventions are developing independently. In practice, multiple interventions must often be applied to the same model sequentially. It is unclear how these interventions interact and whether they can be combined effectively.

lm-compose provides an end-to-end framework for defining, executing, and evaluating intervention compositions on language models. Read our paper for more details!

Features

Support for composing sequential interventions across model editing, unlearning, and compression.
Evaluation for model performance, unlearning, knowledge editing, and compression.
Load models via HuggingFace's Transformers.
Config-driven experiments via Hydra.
Log results to Weights & Biases.

Supported Interventions

Technique	Category	Introduced
Fine-tune	Editing	NA
MEMIT	Editing	Mass-Editing Memory in a Transformer (Meng et al., (2022))
LoRA	Editing	LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., (2021))
RMU	Unlearning	The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning (Li et al., (2024))
GPTQ	Compression	GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers (Frantar et al., (2022))
AWQ	Compression	AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration (Lin et al., (2023))
Wanda	Compression	A Simple and Effective Pruning Approach for Large Language Models (Sun et al., (2023))
SparseGPT	Compression	SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot (Frantar et al., (2023))

Install

This framework has been tested with Python 3.11. It's recommended that you use a virtual environment.

conda create -n lm-compose python=3.11

Dependencies can be installed via pip.

pip install -e .
pip install -e AutoAWQ
pip install -e AutoGPTQ

Note

We use a modified implementation of AutoAWQ and AutoGPTQ to support applying quantization multiple times. Installing AutoGPTQ can take 20+ minutes.

Note

RMU requires additional datasets, which must be loaded manually. The destinations for these datasets are wmdp/data. Installation instructions can be found in the WMDP repository.

Usage

This project configures experiments using Hydra. The categories of interventions are configured using the interventions configuration. The specific intervention for the category is set using the category name config: EX: unlearn=rmu edit=memit compression=awq. The core logic is implemented in main.py

Default values for the model and interventions are stored in conf/config.yaml. These defaults can be overwritten by passing arguments to the main.py script.

Example: Editing + Compression

The following command applied the memit editing intervention and the awq quantization intervention. Quantization is applied with 8-bit weights. Llama3-8B is used as the model.

python main.py model_name=meta-llama/Meta-Llama-3-8B interventions=[edit,compress] edit=memit compress=awq wbits=8

Example: Unlearning + Editing

The following command applied the rmu unlearning intervention and the memit editing intervention. The default values stored in conf/config.yaml for model_name and interventions are used.

python main.py interventions=[unlearn,edit] unlearn=rmu edit=memit

Saving Results

Checkpoints

TODO

Weights & Biases

lm-compose logs evaluation results to Weights & Biases. To enable logging, set wandb to online in the conf/config.yaml file or pass wandb=online as an argument to the main.py script. Users must also point to their own entity and project via the wandb_entity and wandb_project arguments. See conf/config.yaml for more details.

python main.py wandb=online wandb_entity=your_entity wandb_project=your_project interventions=[unlearn,edit,compress] unlearn=rmu edit=memit compress=sparsegpt

Contributing

New interventions, tests, and bug fixes are more than welcome.

Cite

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 225 Commits
AutoAWQ		AutoAWQ
AutoGPTQ		AutoGPTQ
conf		conf
easyeditor		easyeditor
hparams		hparams
notebooks		notebooks
sparsellm		sparsellm
utils		utils
wmdp		wmdp
.gitignore		.gitignore
README.md		README.md
ga_llama3_hparam_search.sh		ga_llama3_hparam_search.sh
ga_utils.py		ga_utils.py
launch_jobs.sh		launch_jobs.sh
launch_jobs_all_editors.sh		launch_jobs_all_editors.sh
launch_jobs_ga.sh		launch_jobs_ga.sh
launch_jobs_highres.sh		launch_jobs_highres.sh
launch_jobs_rmu.sh		launch_jobs_rmu.sh
launch_pruning.sh		launch_pruning.sh
launch_quant.sh		launch_quant.sh
main.py		main.py
main_quantize.py		main_quantize.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_old.txt		requirements_old.txt
rmu_llama3_hparam_search.sh		rmu_llama3_hparam_search.sh
run_edit.py		run_edit.py
run_exp.sh		run_exp.sh
run_exp10x.sh		run_exp10x.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intervention Composability Evaluation Harness

Overview

Features

Supported Interventions

Install

Usage

Example: Editing + Compression

Example: Unlearning + Editing

Saving Results

Checkpoints

Weights & Biases

Contributing

Cite

About

Releases

Packages

Contributors 5

Languages

hartvigsen-group/composable-interventions

Folders and files

Latest commit

History

Repository files navigation

Intervention Composability Evaluation Harness

Overview

Features

Supported Interventions

Install

Usage

Example: Editing + Compression

Example: Unlearning + Editing

Saving Results

Checkpoints

Weights & Biases

Contributing

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages