Transformer Based Model For Machine Translation, English To Persian
- Install Dependencies
$ pip install -r requirements.txt
- Download Pretrained Weights Here
- Project Structure
.
├── src
│ ├── nn
│ │ ├── attention.py
│ │ ├── decoder.py
│ │ ├── dropout.py
│ │ ├── embedding.py
│ │ ├── encoder.py
│ │ ├── __init__.py
│ │ └── transformer.py
│ ├── dataset.py
│ ├── misc.py
│ ├── schedule.py
│ ├── tokenizer.py
│ ├── tracker.py
│ ├── trainutils.py
│ └── vocab.py
├── build.py
├── config.py
├── inference.py
└── main.py
Fig. 1. Proposed Model Architecture
Positional Encoding: Since our model contains no recurrence and no convolution, in order for the model to make use of the order of the sequence, we must inject some information about the relative or absolute position of the tokens in the sequence. In this work, we use sine and cosine functions of different frequencies:
Multi-Head Attention: Multi-head Attention is a module for attention mechanisms which runs through an attention mechanism several times in parallel. The independent attention outputs are then concatenated and linearly transformed into the expected dimension. Intuitively, multipleattention heads allows for attending to parts of the sequence differently (e.g. longer-term dependencies versus shorter-term dependencies).
where
above
Project is distributed under MIT License