Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about training rl model #23

Open
sjchasel opened this issue Aug 14, 2022 · 7 comments
Open

about training rl model #23

sjchasel opened this issue Aug 14, 2022 · 7 comments

Comments

@sjchasel
Copy link

I have trained catSeq model and its performance is as your reported. When I use
python3 train.py -data data/kp20k/kp20k_separated/rl/ -vocab data/kp20k/kp20k_separated/rl/ -exp_path=exp -exp catSeq_rl_kp20k -epochs 20 -model_path=model/catSeq_rl_9527 -copy_attention -train_rl -one2many -one2many_mode 1 -batch_size 32 -separate_present_absent -pretrained_model=model/catSeq_9527/catSeq_kp20k.ml.one2many.cat.copy.bi-directional.epoch=3.batch=38098.total_batch=120000.model -max_length 60 -baseline self -reward_type 7 -replace_unk -topk G -seed=9527 to train a rl model, its loss is wrong from the beginning, and it always -0.000x. What do you think might be the problem?

@sjchasel
Copy link
Author

I find that this is because q value is very small, almost zero. Is that normal?

@kenchan0226
Copy link
Owner

I remembered that the loss is very small from the beginning.

@sjchasel
Copy link
Author

I remembered that the loss is very small from the beginning.

I turned off the early stop, otherwise it would stop after four checkpoints. Now I have trained 4 epoch, but the loss is still -0.0001. Should I wait for it to train for 20 epochs?
I don't know if this is normal for reinforcement learning models right now. How many epochs did you train your reinforcement learning model? Or if you have any trained rl models, could you share them? Thank you.

@kenchan0226
Copy link
Owner

Sorry I do not have any pre-trained models since it was more than three years. I remembered that the best checkpoint is usually located at the 3rd or 4th epoch. So it is reasonable that the training scripts stop at the 4th epoch. I think it is normal to have a small loss in my RL training code.

@Struggle-lsl
Copy link

can i ask some questions

@Struggle-lsl
Copy link

why the precition is all

@Struggle-lsl
Copy link

No description provided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants