MPCS 53113 Natural Language Processing (Spring 2024)

Section 1
Instructor(s) Chaudhary, Amitabh (amitabh)
Location Ryerson 276
Meeting Times Monday 5:30pm - 8:30pm
Fulfills Elective Specialization - Data Analytics (DA-2)

Syllabus

Natural language processing (NLP) is the application of computational techniques, particularly from machine learning, to analyze and synthesize  human language. The recent explosion in the amount of available text data has made natural language processing invaluable for businesses, social sciences, and even natural sciences.
 
In this course we study the fundamentals of modern natural language processing, emphasizing models based on deep learning.   These include language models,  word embeddings,  recurrent neural networks (Simple RNNs, LSTMs), context-free grammars and syntactic parsing, dependency parsing, and attention-based models such as the transformer and BERT.
 
We use Python and Python based libraries such as PyTorch, NLTK, and SpaCy for implementing algorithms and processing text.
 
A significant component is the course project in which students apply NLP techniques to solve a real-world problem.
 


Topics
A tentative list of topics follows.

  • Language Models
  • Word Embeddings
  • Recurrent Neural Networks (RNNs), LSTMs for NLP
  • Convolutional Neural Networks (CNNs)  for NLP
  • Conditioned Generation, Sequence to Sequence Models
  • Attention-Based Models, The Transformer
  • Sytactic Parsing, Context Free Grammars
  • BERT 

Coursework and Evaluation

  • Assignments:   The assignments will be approximately one every week, for the the first half of the course. (In the second half, there will be fewer assignments to allow students to focus on their projects.)  They are designed to reinforce material and test a deeper understanding of the concepts and algorithms through theoretical questions, program implementation, and analysis of empirical results.  Tentatively worth 40% of the grade.
  • Midterm Examination: Tentatively worth 20% of the grade.
  • Course Project: Tentatively worth 40% of the grade.

Textbook

  • Neural Network Methods for Natural Language Processing by Yoav Goldberg (https://doi.org/10.2200/S00762ED1V01Y201703HLT037) 

Course Prerequisites

This course requires you to have completed (or be concurrently taking) MPCS 53111 Machine Learning OR have completed MPCS 53120 Applied Data Analysis.

Further, it requires you to have the following grades:

B+ or better in MPCS 51042 Python Programming
B or better in MPCS 55001 Algorithms
B or better in one of the following courses:
• MPCS 53110 Foundations of Computational Data Analysis
• MPCS 53120 Applied Data Analysis

Other Prerequisites

Programming experience in Python.

This course requires competency in Unix and Linux. Please plan to attend the MPCS Unix Bootcamp (https://masters.cs.uchicago.edu/page/mpcs-unix-bootcamp) and/or review the UChicago CS Student Resource Guide here: https://uchicago-cs.github.io/student-resource-guide/.

Overlapping Classes

This class is scheduled at a time that conflicts with these other classes:

  • MPCS 51083-2 -- Cloud Computing
  • MPCS 52018-1 -- Advanced Computer Architecture
  • MPCS 51032-1 -- Advanced iOS Application Development
  • MPCS 53120-1 -- Applied Data Analysis
  • MPCS 51050-1 -- OO Architecture: Patterns, Technologies, Implementations

Eligible Programs

MA in Computational Social Science (Year 2) Bx/MS in Computer Science (Option 1: Research-Oriented) Bx/MS in Computer Science (Option 2: Professionally-oriented - CS Majors) Bx/MS in Computer Science (Option 3: Profesionally-oriented - Non-CS Majors) Masters Program in Computer Science