Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which decoding method works best #4

Open
cecilialeo77 opened this issue May 16, 2024 · 3 comments
Open

Which decoding method works best #4

cecilialeo77 opened this issue May 16, 2024 · 3 comments

Comments

@cecilialeo77
Copy link

decoding approach --decoding-strategy reparam--<topk_mode>-。In your experiments, is the default decoding method necessarily worse than the one specified in the script? On different datasets, which decoding method's results did you choose as the final answer? Looking forward to your reply!

@LZhengisme
Copy link
Collaborator

Hey, thanks for reaching out! In our experiments, we found the default decoding strategy generally underperforms compared to our approach for all tasks discussed in our paper. We've reported results using our improved decoding strategy. You can find the specific --decoding-strategy parameters for each task at the following links:

Feel free to check them out and let me know if you have any more questions!

@cecilialeo77
Copy link
Author

Thank you for your response! The reason I have this question is that when I was reproducing your data results on the QQP and QG tasks, I found that the BLEU score of the default decoding strategy exceeded that of the specified decoding strategy. For example, in the QG task, I reproduced your results as follows:
NUM_ITER: 10 --decoding-strategy default: avg BLEU score 0.17457566633838953
NUM_ITER: 10 --decoding-strategy reparam-uncond-stochastic5.0-cosine: avg BLEU score 0.17439646335154702
Do you have any suggestions? Should we use the decoding strategy with the higher BLEU score as the final result?

@LZhengisme
Copy link
Collaborator

Thanks for the details! Yes, there might be a lot of variation for more open-ended generation scenarios, like the question generation task here. It’s not uncommon for the default decoding strategy to sometimes perform competitively on these tasks.

Given your findings, I'd suggest experimenting a bit more if you have time (e.g., replacing --argmax-decoding with a larger temperature, say --temperature 0.5; or tweaking parameters in uncond-stochastic5.0-cosine, to see whether the default decoding strategy consistently delivers higher BLEU scores. If so, it makes sense to use default decoding as the final strategy for your scenario. But with such close scores (0.1746 vs. 0.1744), continuing with the reparam strategy could still be a good choice.

Hope this helps! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants