Natural Language Processing
|Title||Natural Language Processing (53113)|
|Instructor||Amitabh Chaudhary (firstname.lastname@example.org)|
|Syllabus||Can we predict how people will vote based on their twitter conversations? Can we identify pairs of researchers who will benefit from collaborating with each other based on their published articles? In this course we will study techniques for automatically detecting patterns and learning hidden structures in text data. Such techniques are of tremendous value due to the explosion in the amount of available text data, and their potential benefit to social sciences and businesses.
We will learn the fundamental steps in natural language processing, such as syntactic parsing or understanding the structure of a sentence, and semantic analysis or understanding the meaning of a sentence from the meanings of the words in it. These will help us build sophisticated models for text classification, such as for detecting sentiment or identifying fake news.
We will see that the a primary challenge is that natural languages are ambiguous. For instance, the sentence I made her duck can be interpreted in five different ways! So our models are probabilistic, and we resolve the ambiguity by training on large amounts of text corpora.
We will study a variety of models in the context of text processing including Markov and hidden Markov models, naive Bayes, logistic regression, and neural networks.
All through the course we will use Python and libraries such as the Natural Language Toolkit (NLTK) for processing real-world data.
A tentative list of topics follows.
Readings will be assigned from primarily Speech and Language Processing by D. Jurafsky and J.H. Martin. Material for other readings will be available on Canvas electronic reserves.
A grade of B+ or better in the following courses:
A grade of B or better in one of the following courses:
MPCS 53111 Machine Learning (recommended; see below)
Programming experience in Python.
Tuesday 5:30-8:30 PM