Skip to content

Latest commit

 

History

History
15 lines (10 loc) · 844 Bytes

README.md

File metadata and controls

15 lines (10 loc) · 844 Bytes

Code for our UCL NLP Final Project, using Hoyer-Square regularisation for learning sparse embeddings.

George Lamb, Kush Madlani, Udeepa Meepegama, Aneesh Pappu

Directory Structure

Code

Code for training the WAKU pipeline, downstream task evaluation, etc. resides in the main directory titled "waku".

Data

All input data should be placed in the folder titled "raw_data". Trained embeddings should be written to "embeddings".

The "questions-words" dataset for the word analogy task can be found here

Downstream Evaluation

A notebook which loads trained embeddings, evaluates them on the sentiment analysis, word analogy, word similarity, and word intrusion tasks, and produces t-SNE visualisations can be found in the folder titled "notebooks".