Skip to content

Abhishek-krg/pre-training-BERT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

pre-training-BERT

Since there is not much documentation on pre-training BERT so I decided to put it all in one place.

BERT (Bidirectional Encoder Representation form Transformers) is a state-of-art NLP neural network developed by google and is even used for Google searches. Recurrent Neural Networks (RNN's) were a standard in NLP before transformers but RNN's flawed in multiple ways as they cannot remember long term dependencies, in much more layman terms the words occuring earlier in sentences looses its dependency from a word occuring far later in the sentence a solution to this was ELMO architecture which provided running 2 seperate LSTM's from left and right and performing shallow concatenation which can also be done using keras Bidirectional wrapper. Bert on the other hand computes dependency of each word with every other word in sentence by performing "self attention" . Attention mechanism makes transformer NLP's deeply bidirectional as the neural network is able to capture dependencies occuring far later in sentences.

BERT requires a pre-training task for capturing dependencies(or understanding the language basics) based on thier position or occurance in sentence. Transformer models are pre-trained on huge datasets for creating language understanding then are fine-tuned on a down stream task for classification , parts of speech tagging etc. When working with newer or different type of data as multilingual or codemixed we may need to train a BERT model from scratch we'll see how to achieve it in this notebook

Releases

No releases published

Packages

No packages published