Implementation of a Machine Translator using a Sequence to Sequence LSTM Network (Encoder-Decoder) with Attention in Tensorflow.
We are using the Open Subtitles Dataset which is a collection of documents from OpenSubtitles. So basically the data comprises of movie subtitles in different languages. We have chosen data for English-> German task and French-> English translation task.
Note - Some language data is in a TMX format, which needs to be converted in a useful format before we train our model. I have written a TMX convertor which accomplishes this task, and can be used on any language dataset for creating a machine translator.
- python 2.7
- tensorflow 1.0.1
- numpy 1.13.0
- scikit-learn 0.18.2
- matplotlib 1.5.1
-
English to German text translation
-
French to English text translation
-
What' s your name -> Was ist denn Name
-
My name is -> Mein Name ist
-
What are you doing -> Was machst du gemacht
-
I am reading a book -> Ich bin ein Buch Buch
-
How are you -> Wie geht' s
-
Quel est ton nom -> What is your name
-
Mon nom est -> My name is
-
Qu'est-ce que tu fais -> You are wrong
-
Oui -> Yeah
-
Non -> No