Skip to content

Transformer Based Model For Machine Translation, English To Persian

License

Notifications You must be signed in to change notification settings

Saeed-Biabani/Machine-Translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Translation

Transformer Based Model For Machine Translation, English To Persian

Quick Links

Dependencies

  • Install Dependencies $ pip install -r requirements.txt
  • Download Pretrained Weights Here

Getting Started

  • Project Structure
.
├── src
│   ├── nn
│   │   ├── attention.py
│   │   ├── decoder.py
│   │   ├── dropout.py
│   │   ├── embedding.py
│   │   ├── encoder.py
│   │   ├── __init__.py
│   │   └── transformer.py
│   ├── dataset.py
│   ├── misc.py
│   ├── schedule.py
│   ├── tokenizer.py
│   ├── tracker.py
│   ├── trainutils.py
│   └── vocab.py
├── build.py
├── config.py
├── inference.py
└── main.py

Architecture

Fig. 1. Proposed Model Architecture

Modules

Positional Encoding: Since our model contains no recurrence and no convolution, in order for the model to make use of the order of the sequence, we must inject some information about the relative or absolute position of the tokens in the sequence. In this work, we use sine and cosine functions of different frequencies:

$\ PE_{(pos, 2i)} = \sin({\frac{pos}{10000^\frac{2i}{d_{model}}}})$

$\ PE_{(pos, 2i+1)} = \cos({\frac{pos}{10000^\frac{2i}{d_{model}}}})$

Multi-Head Attention: Multi-head Attention is a module for attention mechanisms which runs through an attention mechanism several times in parallel. The independent attention outputs are then concatenated and linearly transformed into the expected dimension. Intuitively, multipleattention heads allows for attending to parts of the sequence differently (e.g. longer-term dependencies versus shorter-term dependencies).

$\ MultiHead(Q, K, V) = Concat(head_1, ..., head_h)W^0$

where $\ head_i = Attention(Q{W_i}^Q, K{W_i}^K, V{W_i}^V)$

$\ Attention(Q, K, V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$

above $\ W$ are all learnable parameter matrices.

Dataset

We Use machine_translation_daily_dialog_en_fa DataSet For Train Our Model That You Can Find It Here

Training

🛡️ License

Project is distributed under MIT License

About

Transformer Based Model For Machine Translation, English To Persian

Topics

Resources

License

Stars

Watchers

Forks

Languages