This is a set of lectures and material on natural language processing providing an introduction to the main methods and algorithms used to address practical problems involving natural language. See also the current courses related to this topic:
- Introduction to Natural Language Processing: course offered in the 1st semester of 2020 to the students of the Graduate Program in Informatics at PUC Minas.
Natural language processing (NLP) is one of the most important technologies since people communicate most everything in language, from emails, instant messages, web search, posts in social networks to customer service and medical reports. Particularly, NLP refers to the set of methods for making human language accessible to machines [2]. It designates the design and analysis of representations, methods and algorithms to solve practical language problems by taking as input or produce as output unstructured natural human language data [3]. Usually, problems in NLP involve automatic speech recognition and text summarization, information extraction, machine translation, natural language understanding and generation, sentiment and discouse analysis.
The history of NLP dates back to the 1950s with experiments on automatic machine translation [6]. In the following years experiments on chatbots, conceptual ontologies and question answering were developed and the proposed approaches were mostly based on complex sets of hand-written rules. In the late 80's, the introduction of machine learning algorithms for language processing produced a new paradigm distinct from rule-based NLP, with research mostly focusing on the development of statistical models to make probabilistic decisions based on features extracted from text corpus [5].
Recent advances in artificial intelligence and high performance computing have led to an intensive use of new machine learning models powering NLP applications. In particular, deep neural network based approaches have obtained very high performance across many different NLP tasks [4]. These models can often be trained with a single end-to-end model and do not require traditional, task specific feature engineering. Such neural NLP have been they have been more effective for understanding complex language utterances and have been viewed as a new paradigm distinct from statistical NLP.
- Introduction to NLP [ sl01 | sl02 | sl03]
- Basic text processing [ sl04 ]
- Text representation [ sl05 | sl06 ]
- Text classification [ sl07 | sl08 | sl09 | sl10 ]
- Language modeling [ sl11 | sl12 ]
- POS Tagging [ sl13 ]
- Parsing and context-free grammars [ sl14 | sl15 | sl16]
- Information extraction [ sl17 | sl18 ]
- Text summarization [ sl19 ]
- Machine translation [ sl20 ]
- Question answering [ sl21 ]
- Language generation [ sl22 ]
- Text Semantics [ sl23 ]
- Introduction to neural networks [ sl24 | sl25 | sl26 ]
- Text embeddings [ slides ]
- Neural language models [ slides ]
- And more... NLP with Deep Learning
- Channel: Natural Language Processing by Dan Jurafsky and Christopher Manning.
- Channel: From Languages to Information by Dan Jurafsky.
- Channel: Natural Language Processing with Deep Learning by Stanford University.
- Natural Language Processing by Dan Jurafsky and Christopher Manning
- Neural Networks and Deep Learning by Michael A. Nielsen.
- Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville.
Most of the topics of the lectures are taken from [1], [2] and [3]. Material and assignments are mostly inspired by the Stanford course Natural Language Processing with Deep Learning.
[1] Dan Jurafsky, and James H. Martin. Speech and Language Processing. 3rd ed. 2019.
[2] Jacob Eisenstein. Natural Language Processing. MIT Press. 2018.
[3] Yoav Goldberg. Neural network methods for natural language processing. Synthesis Lectures on Human Language Technologies, 10(1):1–309. 2017.
[4] Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing. Journal of Artificial Intelligence Research, 57(1):345-420. 2016.
[5] Mark Johnson. How the Statistical Revolution Changes (Computational) Linguistics. In Proceedings of the EACL Workshop on the Interaction between Linguistics and Computational Linguistics, p. 3-11, 2009.
[6] Conference on Mechanical Translation. MIT. 1952.