NLP for Social Sciences

This repository supplements the NLP for Social Sciences course taught during the fall semester of 2024 at the Université Lumière Lyon 2.

Lecture/TD materials are stored in the ./n* folders. You can run all Jupyter Notebooks locally or in Google Colab.

Course Plan

Introduction to Natural Language Processing. Challenges of text processing (word ambiguity, idioms, slang, spelling wo). Existing applications of NLP (translation, trend analysis, summarization, virtual assistants). Text preprocessing steps. Lemmatization vs stemming. (CM 1) Link
Vector representation of words. Embeddings obtained with one-hot encoding. Distributional hypothesis. Word-word co-occurrence and PMI matrices. Word-document matrices for tf-idf. Overview of word2vec models. (CM2) Link
Basics of gradient descent for simple functions. Word embeddings using the gensim library. Visualization with t-SNE. (TD1) Link
Summary of approaches to vector representation. Negative sampling. Word2Vec: skip-gram vs CBOW. Linear operations with vectors, including addition and subtraction. Impact of large/small context window size on embedding results. Problem statement for text classification. Overview of feature extraction approaches: count-based vs neural. Overview of text classification with Naive Bayes. (CM3) Link

To run the notebooks on a cloud platform, just click on one of the badges in the table below:

Topic	Colab
1 Preliminaries of gradient descent
2 Word embeddings

Topic	Colab
1 Supervised text classification
0 Text pre-processing

Other Useful Resources: