Zeming Wei, Yihao Zhang, and Meng Sun.
Accepted by SETTA 2024. Preprint: https://arxiv.org/abs/2409.04831
-
Download
SST2, AGnews, mrpc, QNLI, RTE, WMT
datasets and move them into the folder./data
. You can directly copy thedata
folder from BatchICL. -
Edit the paths to your LLMs in
paths.py
. -
Calculate the accuracy with
eval_acc.py
. Example:
python eval_acc.py --model vicuna --task all --shots 20 --test-example 250
- Create folder
./results
and run the mutation testing withmain.py
. The log will be saved in./results
. Example:
python main.py --model vicuna --mutants 20 --test-example 250 --shots 20 --task SST2
- Calculate Standard and Group-wise Mutation Scores with
analysis.py
andmutator_analysis.py
(complete log for all models and tasks required). Example:
python analysis.py --num 50
python mutator_analysis.py
@InProceedings{wei2024mile,
title = {MILE: A Mutation Testing Framework of In-Context Learning Systems},
author = {Wei, Zeming and Zhang, Yihao and Sun, Meng},
booktitle = {SETTA},
year = {2024}
}