Skip to content
Wok edited this page Apr 6, 2019 · 21 revisions

An in-depth commentary is provided on the Commentary page. Overall, I would suggest to match store descriptions with:

A retrieval score can be computed, thanks to a ground truth of games set in the same fictional universe. Alternative scores can be computed as the proportions of genres or tags shared between the query and the retrieved games.

When using average of word embeddings as sentence embeddings:

  • removing only sentence components provided a very large increase of the score (+105%),
  • removing only word components provided a large increase of the score (+51%),
  • removing both components provided a very large increase of the score (+108%),
  • relying on a weighted average instead of a simple average lead to better results,
  • Tf-Idf reweighting lead to better results than Smooth Inverse Frequency reweighting,
  • GloVe word embeddings lead to better results than Word2Vec.

Influence of the removal of sentence components (the same fictional universe)

Plot on a smaller range, and with a tweak to dismiss the retrieved game which is the query itself:

Influence of the removal of sentence components (the same fictional universe), on a smaller range

Similar plots with alternative retrieval scores:

  • based on the proportion of genres shared,

Influence of the removal of sentence components (sharing genres)

  • based on the proportion of tags shared.

Influence of the removal of sentence components (sharing tags)

A table with scores for each major experiment is available. For each game series, the score is the number of games from this series which are found among the top 10 most similar games (excluding the query). The higher the score, the better the retrieval.

Results can be accessed from the following links:

Google's Universal Sentence Encoder

Baselines

Term Frequency * Inverse Document Frequency (Tf-Idf)

Latent Semantic Indexing (LSI/LSA)

Random Projections (RP)

Latent Dirichlet Allocation (LDA)

Hierarchical Dirichlet Process (HDP)

Doc2Vec

AppIDs

AppIDs and categories

AppIDs and genres

AppIDs, categories and genres

Weighted average of word embeddings

GloVe

Main results

Removing sentence components

Removing word components

Removing both sentence and word components

Tweaks

Word2Vec

Main results

Cosine: removing sentence components

Minkowski: removing sentence components

Clone this wiki locally