Natural Language Processing

Title Natural Language Processing (53113)
Quarter Summer 2019
Instructor Amitabh Chaudhary (amitabh@cs.uchicago.edu)
Website

Syllabus Can we predict how people will vote based on their twitter conversations? Can we identify pairs of researchers who will benefit from collaborating with each other based on their published articles? In this course we will study techniques for automatically detecting patterns and learning hidden structures in text data.  Such techniques are of tremendous value due to the explosion in the amount of available text data, and their potential benefit to social sciences and businesses.

We will learn the fundamental steps in natural language processing, such as syntactic parsing or understanding the structure of a sentence, and semantic analysis or understanding the meaning of a sentence from the meanings of the words in it.  These will help us build sophisticated models for text classification, such as for detecting sentiment or identifying fake news.

We will see that the a primary challenge is that natural languages are ambiguous.  For instance, the sentence I made her duck can be interpreted in five different ways!  So our models are probabilistic, and we resolve the ambiguity by training on large amounts of text corpora. 

We will study a variety of models in the context of text processing including Markov and hidden Markov models, naive Bayes, logistic regression, and neural networks.

All through the course we will use Python and libraries such as the Natural Language Toolkit (NLTK) for processing real-world data.

Topics
A tentative list of topics follows.
  • Text processing applications, ambiguity in natural languages.
  • The Natural Language Toolkit
  • N-gram language models
  • Information Retrieval
  • Text Classification, Naive Bayes
  • Logistic Regression
  • Neural Networks
  • Part-of-Speech Tagging
  • Syntactic and Statistical Parsing
  • Semantic Analysis
  • Information Extraction
Coursework and Evaluation
  • Assignments:  These will be weekly, for the first half of the course, and will consist of theoretical and programming questions to help students develop a deeper understanding of the material.  They will be worth  25% of the grade.
  • Quizzes: There will be three in-class quizzes that will test the fundamental concepts. They will count toward 40% of the grade.
  • Course project: Students will work on a project of their choice, individually or in teams of two. This will be worth 30% of the grade.
  • Readings: Students will read assigned material to prepare for each class and answer review questions.  These along with class participation will count for 5% of the grade.
Textbook
Readings will be assigned from primarily Speech and Language Processing by D. Jurafsky and J.H. Martin.  Material for other readings will be available on Canvas electronic reserves.
Prerequisites (Courses)

A grade of B+ or better in the following courses:
• MPCS 50103 Math for Computer Science (or placement exam waiver)
• MPCS 51042 Python Programming
• MPCS 55001 Algorithms

A grade of B or better in one of the following courses:
• MPCS 53110 Foundations of Computational Data Analysis
• MPCS 53120 Applied Data Analysis

MPCS 53111 Machine Learning (recommended; see below)
Equivalent courses or experience will be accepted with instructor permission. A prior course in machine learning would be useful but is not necessary; if you haven't taken any please contact the instructor with your prior courses and experience.

Prerequisites (Other)

Programming experience in Python.

Satisfies

Elective
Data Analytics Specialization (https://masters.cs.uchicago.edu/page/data-analytics)

Time

Tuesday 5:30-8:30 PM

Location

Ryerson 251