Modern word embedding algorithms like word2vec and GloVe provide single representations for words, ignoring contextual information. ELMo, a contextualized embedding model, addresses this by capturing word meaning in context using stacked Bi-LSTM layers. This README outlines the implementation and training of an ELMo architecture from scratch using PyTorch.
The ELMo architecture consists of stacked Bi-LSTM layers to generate contextualized word embeddings. Weights for combining word representations across layers are trained.
ELMo embeddings are learned through bidirectional language modeling on the given dataset's train split.
- Trained model:
bilstm.pt
- Download Model
Trained the ELMo architecture on a 4-way classification task using the AG News Classification Dataset.
Trained the model on the provided News Classification Dataset (same dataset used for other methods in getting word embeddingd - check Word_Vectorization repository of mine for more detail).
Trained and found the best λs for combining word representations across different layers.
- Model:
classifier_1.pt
- Download Model
Randomly initialized and froze the λs.
- Model:
classifier_2.pt
- Download Model
Learned a function to combine word representations across layers.
- Model:
classifier_3.pt
- Download Model
Comprehensive analysis of ELMo's performance in pretraining and the downstream task compared to SVD and Word2Vec embeddings. Included performance metrics like accuracy, F1 score, precision, recall, and confusion matrices for different settings.
data = torch.load("<filename>")
`<data retrieved>` = torch.load("`<filename>`")
Note :-
- While pretraining the elmo , i used only first 10000 sentences in train.csv for it
- Also i used only first 10000 train sentences for downstream task also