Which decoding method works best #4

cecilialeo77 · 2024-05-16T07:46:02Z

decoding approach --decoding-strategy reparam--<topk_mode>-。In your experiments, is the default decoding method necessarily worse than the one specified in the script? On different datasets, which decoding method's results did you choose as the final answer? Looking forward to your reply!

LZhengisme · 2024-05-18T16:17:33Z

Hey, thanks for reaching out! In our experiments, we found the default decoding strategy generally underperforms compared to our approach for all tasks discussed in our paper. We've reported results using our improved decoding strategy. You can find the specific --decoding-strategy parameters for each task at the following links:

Machine translation tasks:

reparam-discrete-diffusion/fairseq/experiments/mt_generate.sh

Lines 31 to 45 in 26ee286

    
           if [[ $DATASET == "iwslt" ]]; then 
        
               DATA_TAG=data-bin/iwslt14.tokenized.de-en 
        
               COND="cond" 
        
               DETERMINISTIC="deterministic" 
        
               STRATEGY="cosine" 
        
           elif [[ $DATASET == "wmt14" ]]; then 
        
               DATA_TAG=data-bin/wmt14_ende 
        
               COND="uncond" 
        
               DETERMINISTIC="deterministic" 
        
               STRATEGY="cosine" 
        
           elif [[ $DATASET == "wmt16" ]]; then 
        
               DATA_TAG=data-bin/wmt16_enro 
        
               COND="uncond" 
        
               DETERMINISTIC="stochastic1.0" 
        
               STRATEGY="cosine"

Other tasks: https://github.com/HKUNLP/reparam-discrete-diffusion/blob/26ee286b281edc6284d74f809465b3e6d42507a6/fairseq/experiments/diffuseq_generate.sh#L63-65

Feel free to check them out and let me know if you have any more questions!

cecilialeo77 · 2024-05-19T06:44:37Z

Thank you for your response! The reason I have this question is that when I was reproducing your data results on the QQP and QG tasks, I found that the BLEU score of the default decoding strategy exceeded that of the specified decoding strategy. For example, in the QG task, I reproduced your results as follows:
NUM_ITER: 10 --decoding-strategy default: avg BLEU score 0.17457566633838953
NUM_ITER: 10 --decoding-strategy reparam-uncond-stochastic5.0-cosine: avg BLEU score 0.17439646335154702
Do you have any suggestions? Should we use the decoding strategy with the higher BLEU score as the final result?

LZhengisme · 2024-05-20T02:15:46Z

Thanks for the details! Yes, there might be a lot of variation for more open-ended generation scenarios, like the question generation task here. It’s not uncommon for the default decoding strategy to sometimes perform competitively on these tasks.

Given your findings, I'd suggest experimenting a bit more if you have time (e.g., replacing --argmax-decoding with a larger temperature, say --temperature 0.5; or tweaking parameters in uncond-stochastic5.0-cosine, to see whether the default decoding strategy consistently delivers higher BLEU scores. If so, it makes sense to use default decoding as the final strategy for your scenario. But with such close scores (0.1746 vs. 0.1744), continuing with the reparam strategy could still be a good choice.

Hope this helps! 😊

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which decoding method works best #4

Which decoding method works best #4

cecilialeo77 commented May 16, 2024

LZhengisme commented May 18, 2024

cecilialeo77 commented May 19, 2024

LZhengisme commented May 20, 2024

Which decoding method works best #4

Which decoding method works best #4

Comments

cecilialeo77 commented May 16, 2024

LZhengisme commented May 18, 2024

cecilialeo77 commented May 19, 2024

LZhengisme commented May 20, 2024