About argmax decoding #2

HyezNee · 2023-12-20T09:32:34Z

Hi,
First I really appreciated for your nice works.
I want to ask the inquiry about the sampling code.

In the RDMs paper, line 9 in pseudo code of 'Sampling from RDMs' says that Draw xe0,n ∼ Categorical(f(xt,n;θ)/τ);
However, in the code, I guess the performance of model can be reproduced when adding --argmax-decoding and it is different from the description.
Is it true that you turn on argmax-decoding mode when you do sampling?

The text was updated successfully, but these errors were encountered:

LZhengisme · 2023-12-20T16:32:21Z

Hi, thanks for being interested in the work!

We provided scripts to reproduce the experiment results of RDMs in fairseq/experiments, where argmax-decoding = True is used for machine translation (here) and temperature = 0.3 for question generation and paraphrasing tasks (here). We also found using a low temperature like 0.1 or 0.2 could achieve similar results to argmax-decoding for translation tasks, although there may be some fluctuations.

We adopt the sampling formulation in the pseudo-code as the argmax case can be included in the formulation when the temperature approaches 0, wherein the distribution would become a point mass on the token with the highest probability and sampling would be equivalent to taking the argmax.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About argmax decoding #2

About argmax decoding #2

HyezNee commented Dec 20, 2023 •

edited

Loading

LZhengisme commented Dec 20, 2023 •

edited

Loading

About argmax decoding #2

About argmax decoding #2

Comments

HyezNee commented Dec 20, 2023 • edited Loading

LZhengisme commented Dec 20, 2023 • edited Loading

HyezNee commented Dec 20, 2023 •

edited

Loading

LZhengisme commented Dec 20, 2023 •

edited

Loading