You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
First I really appreciated for your nice works.
I want to ask the inquiry about the sampling code.
In the RDMs paper, line 9 in pseudo code of 'Sampling from RDMs' says that Draw xe0,n ∼ Categorical(f(xt,n;θ)/τ);
However, in the code, I guess the performance of model can be reproduced when adding --argmax-decoding and it is different from the description.
Is it true that you turn on argmax-decoding mode when you do sampling?
The text was updated successfully, but these errors were encountered:
We provided scripts to reproduce the experiment results of RDMs in fairseq/experiments, where argmax-decoding = True is used for machine translation (here) and temperature = 0.3 for question generation and paraphrasing tasks (here). We also found using a low temperature like 0.1 or 0.2 could achieve similar results to argmax-decoding for translation tasks, although there may be some fluctuations.
We adopt the sampling formulation in the pseudo-code as the argmax case can be included in the formulation when the temperature approaches 0, wherein the distribution would become a point mass on the token with the highest probability and sampling would be equivalent to taking the argmax.
Hi,
First I really appreciated for your nice works.
I want to ask the inquiry about the sampling code.
In the RDMs paper, line 9 in pseudo code of 'Sampling from RDMs' says that Draw xe0,n ∼ Categorical(f(xt,n;θ)/τ);
However, in the code, I guess the performance of model can be reproduced when adding
--argmax-decoding
and it is different from the description.Is it true that you turn on argmax-decoding mode when you do sampling?
The text was updated successfully, but these errors were encountered: