Retrieval Augmented Domain Adaptation

Code for EMNLP 2023 paper Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context Learning (Arxiv).

We propose to retrieve similar examples from the target unlabeled corpus to serve as the context of a source query and perform adaptive in-context learning by concatenating the source query and target contexts as the input prompt. And we propose a domain-adaptive in-context learning (DAICL) framework for different LM architectures, including encoder-only and decoder-only models.

Dataset

NER:

WNUT2016 Data
CoNLL2003 Data Other
Financial NER Data
WNUT2017 Data
BioNER Data
You can also download our processed NER data Here

SA:

Amazon Benchmark 2-classes Data
Amazon Review 3-classes Data
You can also download our processed SA data Here

Retriever

SimCSE

BertScore

python NER_Datasets/retrieval_cross_domain.py

Run

Train LLaMA:

CUDA_VISIBLE_DEVICES=0 python finetune_new.py \
    --base_model 'yahma/llama-7b-hf' \
    --data_path 'NER_Datasets/llama_train_data/gold_demo/wnut16_gold_demo.json' \
    --output_dir 'model/ner_conll03-wnut16_gold_demo_lr3e-4_r16_alpha16_toi_aet_0' \
    --batch_size 256 \
    --micro_batch_size 4 \
    --num_epochs 5 \
    --learning_rate 3e-4 \
    --cutoff_len 512 \
    --val_set_size 1000 \
    --warmup_steps 20 \
    --logging_steps 4\
    --eval_steps 50 \
    --save_steps 50 \
    --lora_r 16 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --lora_target_modules '[q_proj,k_proj,v_proj,o_proj]' \
    --train_on_inputs \
    --add_eos_token

LLaMA Inference:

CUDA_VISIBLE_DEVICES=1 python eval_generate.py \
    --load_8bit \
    --base_model "yahma/llama-7b-hf" \
    --lora_weights 'model/ner_conll03-wnut16_gold_demo_lr3e-4_r16_alpha16_toi_aet_0' \
    --eval_path  "NER_Datasets/llama_inf_data/gold_demo//wnut16_gold_demo.json" \
    --eval_result_path "NER_Datasets/llama_inf_data/eval_result/ner_conll03-wnut16_gold_demo_lr3e-4_r16_alpha16_toi_aet_0/wnut16_gold_demo.txt" \
    --eval_batch_size 3

Run Roberta NER

CUDA_VISIBLE_DEVICES=0 python roberta_ner/train.py \
    --config config/conll03-wnut16_cl_kl.yaml

Citing

Please cite the following paper if you found the resources in this repository useful.

@inproceedings{long-etal-2023-adapt,
    title = "Adapt in Contexts: Retrieval-Augmented Domain Adaptation via In-Context Learning",
    author = "Long, Quanyu  and
      Wang, Wenya  and
      Pan, Sinno",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.402",
    pages = "6525--6542",
}

Acknowledgement

This project is implemented based on alpaca-lora and CLNER source code

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Figs		Figs
NER_Datasets		NER_Datasets
SA_Datasets		SA_Datasets
roberta_ner		roberta_ner
templates		templates
utils		utils
README.md		README.md
eval_generate.py		eval_generate.py
export_hf_checkpoint.py		export_hf_checkpoint.py
export_state_dict_checkpoint.py		export_state_dict_checkpoint.py
finetune_new.py		finetune_new.py
requirements.lock.txt		requirements.lock.txt
requirements.txt		requirements.txt
stream.py		stream.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval Augmented Domain Adaptation

Dataset

Retriever

Run

Citing

Acknowledgement

About

Releases

Packages

Languages

ruyue0001/Retrieval-Augmented-Adaptation

Folders and files

Latest commit

History

Repository files navigation

Retrieval Augmented Domain Adaptation

Dataset

Retriever

Run

Citing

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages