Natural Language Processing
|Title||Natural Language Processing (53113)|
|Instructor||Amitabh Chaudhary (firstname.lastname@example.org)|
|Syllabus||Can we predict how people will vote based on their twitter conversations? Can we identify pairs of researchers who will benefit from collaborating with each other based on their published articles? In this course we will study techniques for automatically detecting patterns and learning hidden structures in text data. Such techniques are of tremendous value due to the explosion in the amount of available text data, and their potential benefit to social sciences and businesses.
We will learn the fundamental steps in natural language processing, such as syntactic parsing or understanding the structure of a sentence, and semantic analysis or understanding the meaning of a sentence from the meanings of the words in it.
We will see that the primary challenge is that natural languages are ambiguous. For instance, the sentence I made her duck can be interpreted in five different ways. So we will focus on probabilistic and machine learning mechanisms that learn ambiguity resolution by training on large amounts of text corpora. These include sequence models such as Markov models, hidden Markov models, and conditional random fields. They also include classification and clustering techniques, such as logistic regression, naive Bayes, support vector machines, Gaussian mixture models, and EM clustering.
All through the course we will both implement algorithms in Python and use Python based libraries such as the Natural Language Toolkit (NLTK) for processing real-world data.
A tentative list of topics follows.
Readings will be assigned from primarily Speech and Language Processing by D. Jurafsky and J.H. Martin. Material for other readings will be available on Chalk electronic reserves.
MPCS 50103 Math for Computer Science
Equivalent courses or experience will be accepted with instructor permission.
Wednesdays 5:30 - 8:30