Par_bn and mam_adapter got "eval_matthews_correlation': 0.0" on GLUE-Cola #618

wenjingk-xilinx · 2023-12-15T09:37:44Z

Environment info

adapters version: 0.1.0

Platform: linux

Python version: 3.8

PyTorch version (GPU?): 2.1.1+gpu

Tensorflow version (GPU?):-

Using GPU in script?: yes

Using distributed or parallel set-up in script?: no

Information

Model I am using (Bert, XLNet ...): Bert-large

Language I am using the model on (English, Chinese ...): English

Adapter setup I am using (if any): as I use setup_adapter_training(model, adapter_args, data_args.task_name or "glue")

# for Par_bn
setup_adapter_training(model, "par_bn", data_args.task_name or "glue")
# for mam_adapter
setup_adapter_training(model, "mam", data_args.task_name or "glue")

The problem arises when using:

my own modified scripts: (give details below)

model=bert-large-uncased
adapter=mam #par_bn
TASK_NAME=cola
python run_glue.py
--model_name_or_path $model
--task_name $TASK_NAME
--do_train
--do_eval
--max_seq_length 128
--per_device_train_batch_size 32
--learning_rate 1e-4
--num_train_epochs 20.0
--evaluation_strategy "epoch"
--output_dir ./save/
--overwrite_output_dir
--train_adapter
--adapter_config ${adapter}

The tasks I am working on is:

an official GLUE/SQUaD task: (give the name) GLUE

To reproduce

Steps to reproduce the behavior:

use this run_glue.py, run my script as previously list

Expected behavior

$ python run_script.py
/proj/ossdataset1/wenjingk/peft/adapters/run_adapters.sh: line 18: cd: adapters: No such file or directory
Using the WANDB_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
12/15/2023 17:06:33 - WARNING - main - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: False
/proj/ossdataset1/wenjingk/anaconda3/envs/llm/lib/python3.8/site-packages/datasets/load.py:2088: FutureWarning: 'use_auth_token' was deprecated in favor of 'token' in version 2.14.0 and will be removed in 3.0.0.
You can remove this warning by passing 'token=<use_auth_token>' instead.
warnings.warn(
[WARNING|modeling_utils.py:3952] 2023-12-15 17:06:41,336 >> Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-large-uncased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
12/15/2023 17:06:44 - WARNING - accelerate.utils.other - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
10%|███████████████████▏ | 536/5360 [05:10<34:51, 2.31it/s]
{'eval_loss': 0.6233669519424438, 'eval_matthews_correlation': 0.0, 'eval_runtime': 10.1056, 'eval_samples_per_second': 103.21, 'eval_steps_per_second': 12.963, 'epoch': 1.0}
15%|████████████████████████████▊ | 804/5360 [07:50<32:58, 2.30it/s]
20%|██████████████████████████████████████▏ | 1072/5360 [10:32<30:59, 2.31it/s]
{'eval_loss': 0.618126392364502, 'eval_matthews_correlation': 0.0, 'eval_runtime': 10.0855, 'eval_samples_per_second': 103.415, 'eval_steps_per_second': 12.989, 'epoch': 3.0}
25%|███████████████████████████████████████████████▊ | 1340/5360 [13:11<29:06, 2.30it/s]
30%|█████████████████████████████████████████████████████████▎ | 1608/5360 [15:52<27:10, 2.30it/s]
{'eval_loss': 0.6205015182495117, 'eval_matthews_correlation': 0.0, 'eval_runtime': 10.0833, 'eval_samples_per_second': 103.439, 'eval_steps_per_second': 12.992, 'epoch': 5.0}
35%|██████████████████████████████████████████████████████████████████▊ | 1876/5360 [18:32<25:13, 2.30it/s]
40%|████████████████████████████████████████████████████████████████████████████▍ | 2144/5360 [21:14<23:17, 2.30it/s]
{'eval_loss': 0.6273216009140015, 'eval_matthews_correlation': 0.0, 'eval_runtime': 10.0745, 'eval_samples_per_second': 103.529, 'eval_steps_per_second': 13.003, 'epoch': 7.0} 45%|█████████████████████████████████████████████████████████████████████████████████████▉ | 2412/5360 [23:53<21:16, 2.31it/s]
50%|███████████████████████████████████████████████████████████████████████████████████████████████▌ | 2680/5360 [26:35<19:23, 2.30it/s]
{'eval_loss': 0.6188081502914429, 'eval_matthews_correlation': 0.0, 'eval_runtime': 10.0585, 'eval_samples_per_second': 103.694, 'eval_steps_per_second': 13.024, 'epoch': 9.0}

The text was updated successfully, but these errors were encountered:

calpt · 2023-12-16T11:26:13Z

Hey @wenjingk-xilinx,

From the script parameters you shared, it seems you're using a bert-large checkpoint. For successful training, you might need to tune the training hyperparameters for this model a bit. E.g. by lowering the learning rate or enabling warmup steps to help the model converge.

I was able to get a Matthews coefficient of ~0.546 after 1 epoch with your training setup (par_bn config) by lowering the learning rate to 5e-5 and setting --warmup_steps 200.

Hope this helps!

wenjingk-xilinx · 2023-12-17T08:37:53Z

Hi @calpt ,
Yeah, I tried with lower learning rate lr=5e-6, then get "eval_matthews_correlation=0.502" for one epoch. Thanks for your quick reply!

wenjingk-xilinx added the bug Something isn't working label Dec 15, 2023

wenjingk-xilinx changed the title ~~Par_bn and MAM got "eval_matthews_correlation': 0.0" on GLUE-Cola~~ Par_bn and mam_adapter got "eval_matthews_correlation': 0.0" on GLUE-Cola Dec 15, 2023

calpt self-assigned this Dec 16, 2023

calpt added question Further information is requested and removed bug Something isn't working labels Dec 16, 2023

wenjingk-xilinx closed this as completed Dec 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Par_bn and mam_adapter got "eval_matthews_correlation': 0.0" on GLUE-Cola #618

Par_bn and mam_adapter got "eval_matthews_correlation': 0.0" on GLUE-Cola #618

wenjingk-xilinx commented Dec 15, 2023 •

edited

Loading

calpt commented Dec 16, 2023

wenjingk-xilinx commented Dec 17, 2023

Par_bn and mam_adapter got "eval_matthews_correlation': 0.0" on GLUE-Cola #618

Par_bn and mam_adapter got "eval_matthews_correlation': 0.0" on GLUE-Cola #618

Comments

wenjingk-xilinx commented Dec 15, 2023 • edited Loading

Environment info

Information

To reproduce

Expected behavior

calpt commented Dec 16, 2023

wenjingk-xilinx commented Dec 17, 2023

wenjingk-xilinx commented Dec 15, 2023 •

edited

Loading