AlternateCD

Requirements

accelerate==0.18.0
huggingface_hub==0.14.1
jieba==0.42.1
nltk==3.8.1
numpy==1.24.3
rouge_chinese==1.0.3
sentencepiece==0.1.99
torch==1.12.0+cu113
tqdm==4.65.0

Datasets

run

unzip datasets.zip
unzip transformers-4.27.1.zip

We put preprossed datasets in datasets/

The datasets for six tasks have been reorganized based on PromptCBLUE, where the original source data is available for download.

Train and evaluate

To generate structure answer from plain text of ground truth

for task_dataset in  'CMedCausal'  'CMeIE-V2'  'CHIP-MDCFNPC' 'IMCS-V2-SR' 'CMeEE-V2' 'IMCS-V2-NER'
do
python src/evaluation/post_coupling.py datasets/$task_dataset/coupling_data/test.json datasets/$task_dataset/coupling_data/test_struc.json
done

To decoupling abilities of LLM

## train
LLM='chatglm' # qwen
for task_dataset in  'CMedCausal'  'CMeIE-V2'  'CHIP-MDCFNPC' 'IMCS-V2-SR' 'CMeEE-V2' 'IMCS-V2-NER'
do
    bash ./src/ft_${LLM}_lora/train.sh $task_dataset  coupling_data 0
    bash ./src/ft_${LLM}_lora/train.sh $task_dataset  identify_data 0
    bash ./src/ft_${LLM}_lora/train.sh $task_dataset  classify_data 0
done

To generate with ALternate Contrastive Decoding

bash ./src/ft_${LLM}_lora/evaluate.sh IMCS-V2-SR coupling_data 0 ACD 2 0.05 0.5 8 True

Reproduce the results by runing bash run.sh

This code refers to the code of PromptCBLUE and self-llm. Thanks their great projects.

Citation

If you find this project useful in your research, please cite the following paper.

@inproceedings{xu2024mitigating,
title={Mitigating Hallucinations of Large Language Models in Medical Information Extraction via Contrastive Decoding},
author={Xu, Derong and Zhang, Ziheng and Zhu, Zhihong and Lin, Zhenxi and Liu, Qidong and Wu, Xian and Xu, Tong and Zhao, Xiangyu and Zheng, Yefeng and Chen, Enhong},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024},
pages={7744--7757},
year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
peft		peft
src		src
README.md		README.md
cd_model.py		cd_model.py
datasets.zip		datasets.zip
gol.py		gol.py
run.sh		run.sh
transformers-4.27.1.zip		transformers-4.27.1.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AlternateCD

Requirements

Datasets

Train and evaluate

To generate structure answer from plain text of ground truth

To decoupling abilities of LLM

To generate with ALternate Contrastive Decoding

Citation

About

Releases

Packages

Languages

Applied-Machine-Learning-Lab/AlternateCD

Folders and files

Latest commit

History

Repository files navigation

AlternateCD

Requirements

Datasets

Train and evaluate

To generate structure answer from plain text of ground truth

To decoupling abilities of LLM

To generate with ALternate Contrastive Decoding

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages