Skip to content
This repository has been archived by the owner on May 19, 2022. It is now read-only.

Latest commit

 

History

History
141 lines (113 loc) · 11.9 KB

README_BASE.md

File metadata and controls

141 lines (113 loc) · 11.9 KB

awesome-sentence-embedding Awesome

Build Status GitHub - LICENSE

A curated list of pretrained sentence and word embedding models

Table of Contents

About This Repo

  • well there are some awesome-lists for word embeddings and sentence embeddings, but all of them are outdated and more importantly incomplete
  • this repo will also be incomplete, but I'll try my best to find and include all the papers with pretrained models
  • this is not a typical awesome list because it has tables but I guess it's ok and much better than just a huge list
  • if you find any mistakes or find another paper or anything please send a pull request and help me to keep this list up to date
  • enjoy!

General Framework

  • Almost all the sentence embeddings work like this:
  • Given some sort of word embeddings and an optional encoder (for example an LSTM) they obtain the contextualized word embeddings.
  • Then they define some sort of pooling (it can be as simple as last pooling).
  • Based on that they either use it directly for the supervised classification task (like infersent) or generate the target sequence (like skip-thought).
  • So, in general, we have many sentence embeddings that you have never heard of, you can simply do mean-pooling over any word embedding and it's a sentence embedding!

Word Embeddings

  • Note: don't worry about the language of the code, you can almost always (except for the subword models) just use the pretrained embedding table in the framework of your choice and ignore the training code

{{{word-embedding-table}}}

OOV Handling

Contextualized Word Embeddings

  • Note: all the unofficial models can load the official pretrained models

{{{contextualized-table}}}

Pooling Methods

Encoders

{{{encoder-table}}}

Evaluation

Misc

Vector Mapping

Articles