|Instructor(s)||Chaudhary, Amitabh (amitabh)|
|Meeting Times||Wednesday 5:30pm - 8:30pm|
|Fulfills||Elective Specialization - Data Analytics (DA-2)|
Can we predict how people will vote based on their twitter conversations? Can we identify pairs of researchers who will benefit from collaborating with each other based on their published articles? In this course we will study techniques for automatically detecting patterns and learning hidden structures in text data. Such techniques are of tremendous value due to the explosion in the amount of available text data, and their potential benefit to social sciences and businesses.
We will learn the fundamental steps in natural language processing, such as syntactic parsing or understanding the structure of a sentence, and semantic analysis or understanding the meaning of a sentence from the meanings of the words in it.
We will see that the primary challenge is that natural languages are ambiguous. For instance, the sentence I made her duck can be interpreted in five different ways. So we will focus on probabilistic and machine learning mechanisms that learn ambiguity resolution by training on large amounts of text corpora. These include sequence models such as Markov models, hidden Markov models, and conditional random fields. They also include classification and clustering techniques, such as logistic regression, naive Bayes, support vector machines, Gaussian mixture models, and EM clustering.
All through the course we will both implement algorithms in Python and use Python based libraries such as the Natural Language Toolkit (NLTK) for processing real-world data.
A tentative list of topics follows.
MPCS 50103 Math for Computer Science
MPCS Programming core requirement
MPCS 53110 Foundations of Computational Data Analysis
MPCS 53111 Machine Learning
Equivalent courses or experience will be accepted with instructor permission.
This class is scheduled at a time that does not conflict with any other classes this quarter.