Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About argmax decoding #2

Open
HyezNee opened this issue Dec 20, 2023 · 1 comment
Open

About argmax decoding #2

HyezNee opened this issue Dec 20, 2023 · 1 comment

Comments

@HyezNee
Copy link

HyezNee commented Dec 20, 2023

Hi,
First I really appreciated for your nice works.
I want to ask the inquiry about the sampling code.

In the RDMs paper, line 9 in pseudo code of 'Sampling from RDMs' says that Draw xe0,n ∼ Categorical(f(xt,n;θ)/τ);
However, in the code, I guess the performance of model can be reproduced when adding --argmax-decoding and it is different from the description.
Is it true that you turn on argmax-decoding mode when you do sampling?

@LZhengisme
Copy link
Collaborator

LZhengisme commented Dec 20, 2023

Hi, thanks for being interested in the work!

We provided scripts to reproduce the experiment results of RDMs in fairseq/experiments, where argmax-decoding = True is used for machine translation (here) and temperature = 0.3 for question generation and paraphrasing tasks (here). We also found using a low temperature like 0.1 or 0.2 could achieve similar results to argmax-decoding for translation tasks, although there may be some fluctuations.

We adopt the sampling formulation in the pseudo-code as the argmax case can be included in the formulation when the temperature approaches 0, wherein the distribution would become a point mass on the token with the highest probability and sampling would be equivalent to taking the argmax.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants