-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
273 changed files
with
69,134 additions
and
2 deletions.
There are no files selected for viewing
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Explaining and Improving Contrastive Decoding by Extrapolating the Probabilities of a Huge and Hypothetical LM | ||
|
||
<p align="center"><img src="https://github.com/amazon-science/llm-asymptotic-decoding/blob/master/AP_sampling/imgs/APD_first_figure.png?raw=true" width="1586" height="1402"></p> | ||
|
||
## Introduction | ||
|
||
To overcome the limitation of contrastive decoding (CD), we propose a new unsupervised decoding method called **A**symptotic **P**robability **D**ecoding (APD). APD explicitly extrapolates the probability curves from the LMs of different sizes to infer the asymptotic probabilities from an infinitely large LM without inducing more inference costs than CD. In FactualityPrompts, an open-ended text generation benchmark, sampling using APD significantly boosts factuality in comparison to the CD sampling and its variants, and achieves state-of-the-art results for Pythia 6.9B and OPT 6.7B. Furthermore, in five commonsense QA datasets, APD is often significantly better than CD and achieves a similar effect of using a larger LLM. For example, the perplexity of APD on top of Pythia 6.9B is even lower than the perplexity of Pythia 12B in CommonsenseQA and LAMBADA. | ||
|
||
|
||
## Computational Environment | ||
|
||
You can reproduce our python enviroment using | ||
``` | ||
conda create --name <env> --file requirement.txt | ||
``` | ||
Most of the codes could also be run using older versions (e.g., the version in the REAL_sampling/requirement.txt) of huggingface except for running the Qwen LLM | ||
|
||
## How to run APD | ||
|
||
To learn how to use APD and/or REAL sampling in huggingface, please see the following example code | ||
|
||
``` | ||
./src/example_APD_REAL.py | ||
``` | ||
|
||
### Run FactualityPrompts | ||
|
||
To evaluate the generation results, first follow ../FactualityPrompt/README.md to download the data, change ../FactualityPrompt/src/const.py and run the following script. | ||
|
||
If you have >7 GPUs in your machine, you can just run the following file to generate the contiunations. | ||
``` | ||
./bin/continue_wiki_prompt_loop_eval.sh | ||
``` | ||
|
||
### Run Question Answering Datasets | ||
|
||
Step 1: Run the dataset download codes at src/QA/dataset_preparation (For ARC, we concatenate the easy and challenge json output). | ||
|
||
Step 2: Test APD models on the datasets. For datasets with only positive answers (e.g., LAMBADA, SQuAD, and MultiRC), use src/QA/dataset_preparation/test_squad_dataset.py. For the datasets with negative answers (e.g., QASC, ARC, SocialIQA, and CommonsenceQA), use src/QA/dataset_preparation/test_neg_dataset.py . If you want to also test the APD on the fly baseline, use test_squad_dataset_online_all.py and test_neg_dataset_online_all.py instead. Remember to change the paths in each file accordingly. | ||
|
||
Step 3: Run analyze_results.py or analyze_results_online_all.py to collect results. For datasets that have negative answers and accuracy metrics, set have_acc to be 1. | ||
|
||
|
||
## How to Train ALM' (in order to use APD) | ||
|
||
Put your text file into "data/raw/". | ||
|
||
Change the INPUT_FILE, data_folder_name, and OUTPUT_MODEL_FOLDER in bin/finetune_ALM.sh and run it (Assuming you have more than 7 GPUs in your machine). | ||
|
||
Notice that our current implementation will first save lots of probabilities and logits from the top tokens of various LLMs into a cache, which will take lot of disk space. | ||
And we also need lots of CPU memory to load these probabilities. For example, after process ~270M Wikipedia text using 5 OPT models, we store 70G tensor and 52G dataset cache and our server has around 750G cpu memory. |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
#!/bin/bash | ||
#top_k=10 | ||
bptt=1024 | ||
|
||
#data_folder_name="wiki2021_1e4_Pythia" | ||
#data_folder_name="ROC_gen_1000_p095_Pythia" | ||
#data_folder_name="news_gen_1000_p095_Pythia" | ||
#data_folder_name="wp_gen_1000_p095_Pythia" | ||
#data_folder_name="wiki2021_1e6_Pythia" | ||
data_folder_name="wiki2021_5e6_Pythia" | ||
#data_folder_name="ROC_spring_Pythia" | ||
#data_folder_name="wikinews_Pythia" | ||
#data_folder_name="wp_5000_Pythia" | ||
#data_folder_name="wp_20000_Pythia" | ||
#data_folder_name="wiki2021_1e5_Pythia" | ||
|
||
#top_k="10" | ||
#sampling_methods="10_20" | ||
|
||
top_k="20,5,10" | ||
sampling_methods="0_20,20_100,100_inf" | ||
#top_k="20,20,20" | ||
#sampling_methods="0_20,20_100,100_inf" | ||
|
||
top_w_idx_model_name="EleutherAI/pythia-6.9b-deduped" | ||
output_folder="data/processed/$data_folder_name/prob_tensor_${bptt}_ext2" | ||
#output_folder="data/processed/$data_folder_name/prob_tensor_${bptt}_ext3" | ||
#input_folder_name="../true_entropy/data/processed/$data_folder_name" | ||
input_folder_name="data/processed/$data_folder_name" | ||
|
||
declare -a bsz_arr=(2 4 4 8 12 16) | ||
declare -a model_arr=("EleutherAI/pythia-2.8b-deduped" "EleutherAI/pythia-1.4b-deduped" "EleutherAI/pythia-1b-deduped" "EleutherAI/pythia-410m-deduped" "EleutherAI/pythia-160m-deduped" "EleutherAI/pythia-70m-deduped" ) | ||
|
||
model_name="EleutherAI/pythia-6.9b-deduped" | ||
batch_size=1 | ||
cuda_init=0 | ||
echo "python src/collect_top_prob.py --model_name=$model_name --top_w_idx_model_name=$top_w_idx_model_name --input_folder_name $input_folder_name --output_folder $output_folder --cuda_idx $cuda_init --batch_size $batch_size --top_k $top_k --sampling_methods $sampling_methods --bptt $bptt" | ||
python src/collect_top_prob.py --model_name=$model_name --top_w_idx_model_name=$top_w_idx_model_name --input_folder_name $input_folder_name --output_folder $output_folder --cuda_idx $cuda_init --batch_size $batch_size --top_k $top_k --sampling_methods $sampling_methods --bptt $bptt | ||
|
||
pids=() | ||
|
||
for i in "${!model_arr[@]}"; | ||
do | ||
model_name=${model_arr[$i]} | ||
batch_size=${bsz_arr[$i]} | ||
echo "python src/collect_top_prob.py --model_name=$model_name --top_w_idx_model_name=$top_w_idx_model_name --input_folder_name $input_folder_name --output_folder $output_folder --cuda_idx $i --batch_size $batch_size --top_k $top_k --sampling_methods $sampling_methods --bptt $bptt" | ||
python src/collect_top_prob.py --model_name=$model_name --top_w_idx_model_name=$top_w_idx_model_name --input_folder_name $input_folder_name --output_folder $output_folder --cuda_idx $i --batch_size $batch_size --top_k $top_k --sampling_methods $sampling_methods --bptt $bptt & | ||
pids+=($!) | ||
done | ||
echo "${pids[@]}" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
#!/bin/bash | ||
#bptt=1024 | ||
bptt=128 | ||
|
||
#data_folder_name="ROC_gen_1000_p095_OPT" | ||
#data_folder_name="news_gen_1000_p095_OPT" | ||
#data_folder_name="wp_gen_1000_p095_OPT" | ||
#data_folder_name="openwebtext_2017_18_1e5_OPT" | ||
#data_folder_name="wiki2021_1e6_OPT" | ||
data_folder_name="wiki2021_1e6_Qwen" | ||
#data_folder_name="wiki2021_5e6_OPT" | ||
#data_folder_name="ROC_spring_OPT" | ||
#data_folder_name="wikinews_OPT" | ||
#data_folder_name="wp_5000_OPT" | ||
#data_folder_name="wp_20000_OPT" | ||
#data_folder_name="wiki2021_1e5_OPT" | ||
|
||
#top_k="10" | ||
#sampling_methods="10_20" | ||
top_k="20,5,10" | ||
sampling_methods="0_20,20_100,100_inf" | ||
|
||
#top_w_idx_model_name="EleutherAI/pythia-6.9b-deduped" | ||
#top_w_idx_model_name="facebook/opt-6.7b" | ||
top_w_idx_model_name="Qwen/Qwen1.5-4b" | ||
#top_w_idx_model_name="Qwen/Qwen1.5-4b-Chat" | ||
#output_folder="data/processed/$data_folder_name/prob_opt_tensor_$bptt" | ||
output_folder="data/processed/$data_folder_name/prob_Qwen_4b_tensor_${bptt}_new" | ||
#output_folder="data/processed/$data_folder_name/prob_Qwen_4b-Chat_tensor_${bptt}_new" | ||
#input_folder_name="../true_entropy/data/processed/$data_folder_name" | ||
input_folder_name="data/processed/$data_folder_name" | ||
|
||
declare -a bsz_arr=(4 8) | ||
declare -a model_arr=("Qwen/Qwen1.5-1.8b" "Qwen/Qwen1.5-0.5b" ) | ||
#declare -a model_arr=("Qwen/Qwen1.5-1.8b-Chat" "Qwen/Qwen1.5-0.5b-Chat" ) | ||
|
||
model_name="Qwen/Qwen1.5-4b" | ||
#model_name="Qwen/Qwen1.5-4b-Chat" | ||
batch_size=2 | ||
cuda_init=0 | ||
echo "python src/collect_top_prob.py --model_name=$model_name --top_w_idx_model_name=$top_w_idx_model_name --input_folder_name $input_folder_name --output_folder $output_folder --cuda_idx $cuda_init --batch_size $batch_size --top_k $top_k --sampling_methods $sampling_methods --bptt $bptt" | ||
python src/collect_top_prob.py --model_name=$model_name --top_w_idx_model_name=$top_w_idx_model_name --input_folder_name $input_folder_name --output_folder $output_folder --cuda_idx $cuda_init --batch_size $batch_size --top_k $top_k --sampling_methods $sampling_methods --bptt $bptt | ||
|
||
pids=() | ||
|
||
for i in "${!model_arr[@]}"; | ||
do | ||
model_name=${model_arr[$i]} | ||
batch_size=${bsz_arr[$i]} | ||
echo "python src/collect_top_prob.py --model_name=$model_name --top_w_idx_model_name=$top_w_idx_model_name --input_folder_name $input_folder_name --output_folder $output_folder --cuda_idx $i --batch_size $batch_size --top_k $top_k --sampling_methods $sampling_methods --bptt $bptt" | ||
python src/collect_top_prob.py --model_name=$model_name --top_w_idx_model_name=$top_w_idx_model_name --input_folder_name $input_folder_name --output_folder $output_folder --cuda_idx $i --batch_size $batch_size --top_k $top_k --sampling_methods $sampling_methods --bptt $bptt & | ||
pids+=($!) | ||
done | ||
echo "${pids[@]}" | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
#!/bin/bash | ||
bptt=1024 | ||
|
||
#data_folder_name="ROC_gen_1000_p095_OPT" | ||
#data_folder_name="news_gen_1000_p095_OPT" | ||
#data_folder_name="wp_gen_1000_p095_OPT" | ||
#data_folder_name="openwebtext_2017_18_1e5_OPT" | ||
#data_folder_name="wiki2021_1e6_OPT" | ||
data_folder_name="wiki2021_5e6_OPT" | ||
#data_folder_name="ROC_spring_OPT" | ||
#data_folder_name="wikinews_OPT" | ||
#data_folder_name="wp_5000_OPT" | ||
#data_folder_name="wp_20000_OPT" | ||
#data_folder_name="wiki2021_1e5_OPT" | ||
|
||
#top_k="10" | ||
#sampling_methods="10_20" | ||
top_k="20,5,10" | ||
sampling_methods="0_20,20_100,100_inf" | ||
|
||
#top_w_idx_model_name="EleutherAI/pythia-6.9b-deduped" | ||
top_w_idx_model_name="facebook/opt-6.7b" | ||
#output_folder="data/processed/$data_folder_name/prob_opt_tensor_$bptt" | ||
output_folder="data/processed/$data_folder_name/prob_opt_tensor_${bptt}_new" | ||
#input_folder_name="../true_entropy/data/processed/$data_folder_name" | ||
input_folder_name="data/processed/$data_folder_name" | ||
|
||
declare -a bsz_arr=(2 4 8 16) | ||
declare -a model_arr=("facebook/opt-2.7b" "facebook/opt-1.3b" "facebook/opt-350m" "facebook/opt-125m" ) | ||
|
||
model_name="facebook/opt-6.7b" | ||
batch_size=1 | ||
cuda_init=0 | ||
echo "python src/collect_top_prob.py --model_name=$model_name --top_w_idx_model_name=$top_w_idx_model_name --input_folder_name $input_folder_name --output_folder $output_folder --cuda_idx $cuda_init --batch_size $batch_size --top_k $top_k --sampling_methods $sampling_methods --bptt $bptt" | ||
python src/collect_top_prob.py --model_name=$model_name --top_w_idx_model_name=$top_w_idx_model_name --input_folder_name $input_folder_name --output_folder $output_folder --cuda_idx $cuda_init --batch_size $batch_size --top_k $top_k --sampling_methods $sampling_methods --bptt $bptt | ||
|
||
pids=() | ||
|
||
for i in "${!model_arr[@]}"; | ||
do | ||
model_name=${model_arr[$i]} | ||
batch_size=${bsz_arr[$i]} | ||
echo "python src/collect_top_prob.py --model_name=$model_name --top_w_idx_model_name=$top_w_idx_model_name --input_folder_name $input_folder_name --output_folder $output_folder --cuda_idx $i --batch_size $batch_size --top_k $top_k --sampling_methods $sampling_methods --bptt $bptt" | ||
python src/collect_top_prob.py --model_name=$model_name --top_w_idx_model_name=$top_w_idx_model_name --input_folder_name $input_folder_name --output_folder $output_folder --cuda_idx $i --batch_size $batch_size --top_k $top_k --sampling_methods $sampling_methods --bptt $bptt & | ||
pids+=($!) | ||
done | ||
echo "${pids[@]}" | ||
|
Oops, something went wrong.