This course is open to MPCS students only.
This course introduces the fundamental concepts and techniques in data mining, machine learning, and statistical modeling, and the practical know-how to apply them to real-world data through Python-based software. The course examines in detail topics in both supervised and unsupervised learning. These include linear and logistic regression and regularization; classification using decision trees, nearest neighbors, naive Bayes, boosting, random trees, and artificial/convolutional neural networks; clustering using k-means and expectation-maximization; and dimensionality reduction through PCA and SVD. Students use Python and Python libraries such as NumPy, SciPy, matplotlib, and pandas for for implementing algorithms and analyzing data.
Apart from lectures, we conduct optional but strongly recommended problem sessions. During these the TAs present homework solutions, and other optional material. These are the only source for homework solutions; in particular, we do not publish any solutions. Recording or streaming the sessions are also not planned. In Spring, 2019, the problem sessions are most likely to be held on Sunday afternoons; but they may be moved to Saturdays based on TA availability.
1. B+ or above in MPCS 51042 Python Programming (or in Programming core requirement with prior knowledge of Python)
2. B+ or above in MPCS 50103 Math for Computer Science or passing the corresponding placement exam
3. B+ or above in MPCS 55001 Algorithms
4. B or above in MPCS 53110 Foundations of Computational Data Analysis or passing the corresponding placement exam)
If your grades in the above classes do not meet the minimum requirements set above, please contact the instructor to discuss your background.
Univariate Calculus and Basic Multivariate Calculus (double integrals, partial derivatives, integration-by-parts, Taylor series).
This course assumes both mathematical maturity and programming fluency. In particular, students are expected to code complicated machine learning algorithms from scratch (without a template) and debug them on their own.
Not approved for CAPP or MACS students.