Skip to content

Latest commit

 

History

History
73 lines (56 loc) · 2.3 KB

README.md

File metadata and controls

73 lines (56 loc) · 2.3 KB

Generation Improving Fine Tuning

This repository shares code implementing various fine-tuning methodologies aimed at enhancing the generation capabilities of natural language generation model. There are four methodologies in total: Standard, Auxiliary, Recurrent, and Generative. Detailed descriptions of each methodology, along with their basic setups and performance evaluations in machine translation, are provided below.

Fine-Tuning Strategy

Standard

Standard Fine Tuning is the most common method of fine-tuning. It involves taking the parameters of a pre-trained model and applying the same training process as usual, but with a reduced learning rate for fine adjustments.


Auxiliary

The Auxiliary strategy is a method that reduces the risk of exposure bias by using First Token Prediction as an auxiliary training objective alongside Maximum Likelihood Estimation (MLE), which serves as the main training objective.


Recurrent

The Recurrent approach is a fine-tuning method inspired by Schedule Sampling for Transformer, where the output of the decoder is recursively utilized as input to the decoder. However, unlike the Scheduled Sampling typically used in RNN Seq2Seq, the key difference lies in exclusively replacing all input values with the output of the decoder.


Generative

The Generative method is a fine-tuning approach that incorporates generation during the training process by applying a certain proportion of the inference generation methodology.



Setups

Dataset Model Training
WMT14 En-De Transformer Seq2Seq Num of Epoch: 10



Results

Strategy Score Epoch Time Avg GPU Max GPU
BaseLine
Standard
Auxiliary
Recurrent
Generative



How to Use

Clone repo on your local env

git clone

Setup Dataset and Tokenizer

python3 setup.py

Actual Process via run.py file

python3 run.py -mode     [train, finetune, test, inference]
               -strategy [standard(default), auxiliary, recurrent, generative]
               -search   [beam, greedy]



Reference

  • Attention is all you need
  • Scheduled Sampling for Transformers