Skip to content

Latest commit

 

History

History
49 lines (36 loc) · 2.88 KB

README.md

File metadata and controls

49 lines (36 loc) · 2.88 KB

Code for our COLING 2022 paper Multilingual and Multimodal Topic Modelling with Pretrained Embeddings

Abstract

We present M3L-Contrast--—a novel multimodal multilingual (M3L) neural topic model for comparable data that maps multilingual texts and images into a shared topic space using a contrastive objective. As a multilingual topic model, it produces aligned language-specific topics and as multimodal model, it infers textual representations of semantic concepts in images. We also show that our model performs almost as well on unaligned embeddings as it does on aligned embeddings.

Our proposed topic model is:

  • multilingual
  • multimodal (image-text)
  • multimodal and multilingual (M3L)

Our model is based on the Contextualized Topic Model (Bianchi et al., 2021)

We use the PyTorch Metric Learning library for the InfoNCE/NTXent loss

Model architecture

Dataset

  • Aligned articles from the Wikipedia Comparable Corpora
  • Images from the WIT dataset
  • We will release the article titles and image urls in the train and test sets (soon!)

Talks and slides

  • Slides and video from my talk at the Helsinki Language Technology seminar

Trained models

We shared some of the models we trained:

  • M3L topic model trained with CLIP embeddings for texts and images
  • M3L topic model trained with multilingual SBERT for text and CLIP for images
  • M3L topic model trained with monolingual SBERT models for the English and German texts and CLIP for images

Citation

@inproceedings{zosa-pivovarova-2022-multilingual,
    title = "Multilingual and Multimodal Topic Modelling with Pretrained Embeddings",
    author = "Zosa, Elaine  and  Pivovarova, Lidia",
    booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
    month = oct,
    year = "2022",
    address = "Gyeongju, Republic of Korea",
    publisher = "International Committee on Computational Linguistics",
    url = "https://aclanthology.org/2022.coling-1.355",
    pages = "4037--4048",
}