Skip to content

Latest commit

 

History

History
31 lines (22 loc) · 3.22 KB

README.md

File metadata and controls

31 lines (22 loc) · 3.22 KB

NLP for Social Sciences

This repository supplements the NLP for Social Sciences course taught during the fall semester of 2024 at the Université Lumière Lyon 2.

Lecture/TD materials are stored in the ./n* folders. You can run all Jupyter Notebooks locally or in Google Colab.

Course Plan

  1. Introduction to Natural Language Processing. Challenges of text processing (word ambiguity, idioms, slang, spelling wo). Existing applications of NLP (translation, trend analysis, summarization, virtual assistants). Text preprocessing steps. Lemmatization vs stemming. (CM 1) Link
  2. Vector representation of words. Embeddings obtained with one-hot encoding. Distributional hypothesis. Word-word co-occurrence and PMI matrices. Word-document matrices for tf-idf. Overview of word2vec models. (CM2) Link
  3. Basics of gradient descent for simple functions. Word embeddings using the gensim library. Visualization with t-SNE. (TD1) Link
  4. Summary of approaches to vector representation. Negative sampling. Word2Vec: skip-gram vs CBOW. Linear operations with vectors, including addition and subtraction. Impact of large/small context window size on embedding results. Problem statement for text classification. Overview of feature extraction approaches: count-based vs neural. Overview of text classification with Naive Bayes. (CM3) Link

TD 1

To run the notebooks on a cloud platform, just click on one of the badges in the table below:

Topic Colab
1 Preliminaries of gradient descent Open In Colab
2 Word embeddings Open In Colab

TD 2

Topic Colab
1 Supervised text classification Open In Colab
0 Text pre-processing Open In Colab

Other Useful Resources:

  1. https://perso.limsi.fr/anne/MRSD.html (in French)
  2. https://web.stanford.edu/~jurafsky/slp3/ (in English)