This course is on the fundamentals of natural language processing (NLP), including text representation, language modeling, NLP tasks and paradigms. Case studies on open-source tools are used to illustrate techniques and trade-offs.
- Season: 2020, 1st semester
- Time: Tuesday, 07:00 - 08:40
- Location: PUC Minas, campus São Gabriel, building L, room 314
- Pre-requisites: Statistics, data structures and algorithms, and programming skills
- Instructor: Wladmir Cardoso Brandão
- Teaching assistants:
The main goal of this basic level course is to provide an excursion into research in NLP, emphasizing text processing, language modeling, text embeddings, and NLP tasks through programming projects. Upon successful completion of this course, the student should be able to:
- Modeling text as vectors.
- Use text processing algorithms.
- Design learning approaches for NLP tasks.
The final grade will be determined by the grades in assignments, exams, and participation as follows:
- 65%: Assignments
- 15%: Homeworks
- 50%: Projects
- 25%: Final Exam
- 10%: Participation
Students will complete multiple homeworks during the course. These assignments are designed to reinforce the lectures and reading materials and their due dates are posted in this website. Each assignment will be graded out of a total of 100 points and are counted equally when computing the homeworks portion of the final grade.
Students will complete programming projects during the course. Each project is cumulative, i.e., the students need to successfully complete each assignment in order to complete the next one and we will not release solutions for the projects. Each project will be graded out of a total of 100 points and are counted equally when computing the projects portion of the final grade.
There will be one in-class exams during the course, a final exam at the end of the semester. The exam will be based on the mandatory readings and topics discussed in class.
Students must be actively involved in classes, performing the required activities. There are several ways of earning participation credit, including attending lectures, punctuality, completing feedback surveys, and behavioral aspects, such as not disturbing other students. Participation will be graded out of a total of 100 points and are counted equally when computing the participation portion of the final grade.
Assignments and exams should be completed independently by each student and any program code should always be appropriately commented. Students will be held responsible for all information presented in the assignments and they must strictly follow the instructions provided for each assignment. Students should be sure to hand in assignments on time, don't waiting until the last minute to begin. Starting early will give students ample time to ask questions and obtain assistance. We recommend to reserve late days only for legitimate emergencies.
Late penalties are a loss of a percentage of the original overall points for the assignment so that any assignment handed in late will be marked off 25% per day. That is, after 4 days, the grade will be zero. Each late day constitutes a 24-hour extension, including all weekend days and holidays. Students cannot split late days into smaller increments. For instance, a submission that is 1 minute late will count as one day late.
In extreme circumstances, such as medical emergencies, we will grant no-penalty extensions. Please be prepared to provide written documentation, e.g., doctor's note.
Students should not look for assignment answers elsewhere. The use of pre-existing code is allowed since properly acknowledged. Students who demonstrably violate the Academic Honesty policy presented in PUC Minas Student Guide will receive a failing grade in the course and the case will be reported to the Administrative Board of the School, who could require suspension from all future work. Prohibited behaviors include:
- copying all or part of another person's work, even if you subsequently modify it
- viewing all or part of another student's work
- showing all or part of your work to another student
- consulting solutions from past semesters, or those found in books or on the Web
Not knowing or misunderstanding the rules, running out of time, submitting "the wrong version", or being overwhelmed with multiple demands are not acceptable excuses. There are no excuses for failure to uphold policies. Plagiarism checker, such as the Moss system, can be used to screen submitted programs for plagiarism. Over the years, we unfortunately had to fail students for copying on assignments. To avoid problems, limit any discussion of assignments with other students to clarification of the requirements or definitions of the problems, or to understanding the existing programs or general course material. Never discuss issues directly relevant to problem solutions.
This is a tentative schedule, enabling students to see what is coming up or what they will miss if absent, but changes can happen. Details of the schedule, materials and reading lists will be updated as the course progresses. Periodical reading assignments from recent research articles will be given and should be read before the corresponding lecture. Mostly, lectures will use slides to allow students to focus on understanding the material during class and reduce the need for taking notes. However, simply reading the slides is no substitute for attending class, in which additional explanation and discussion are presented.
# | Date | Topic or Activity | Material |
---|---|---|---|
1 | Mar 3 | First Lecture: course goals, activities and schedule | slide |
2 | Mar 10 | Introduction to NLP | slide | videos |
3 | Mar 17 | Text representation and vector space | slide | videos |
4 | Mar 24 | Text preprocessing | slide | videos |
5 | Mar 31 | Language modeling | slide | videos |
6 | Apr 7 | Language modeling | slide | videos |
7 | Apr 14 | Text classification | slide | videos |
8 | Apr 28 | NLP tasks and paradigms | slide | videos |
9 | May 5 | Tagging | slide | videos |
10 | May 12 | Sequence labeling | slide | videos |
11 | May 19 | Language generation | slide | videos |
12 | May 26 | Semantic similarity | slide | videos |
13 | Jun 2 | Introduction to neural networks | slide | videos |
14 | Jun 9 | Neural NLP | slide | videos |
15 | Jun 16 | Word embeddings | slide | videos |
16 | Jun 23 | Sentence embeddings | slide | videos |
17 | Jun 30 | Seminars | slide | videos |
18 | Jul 7 | Final exam | slide | videos |
ID | Assignment | Type | Release Date | Due Date | Solution |
---|---|---|---|---|---|
AS01 | Text preprocessing | homework | Mar 10, 2019 | Mar 31, 2019 @ 23:59 | Solution |
AS02 | Text classification | homework | Mar 31, 2019 | Apr 28, 2019 @ 23:59 | Solution |
AS03 | Entity recognition | homework | Apr 28, 2019 | May 19, 2019 @ 23:59 | Solution |
AS04 | Final project | project | Mar 3, 2019 | Jun 29, 2019 @ 23:59 | - |