Section | 1 |
---|---|

Instructor(s) | Chaudhary, Amitabh (amitabh) |

Location | Ryerson 178 (Hybrid) |

Meeting Times | Wednesday 5:30pm - 8:30pm |

Fulfills | Elective Specialization - Data Analytics (DA-2) |

In this course we study the algorithms and the associated distributed computing systems used in analyzing massive datasets, or *big data,* and in large-scale machine learning. We also cover the foundations of reinforcement learning.

We focus on two fundamental ideas for scaling analysis to large datasets: (i) distributed computing, and (ii) randomization. In the former, we study how to design, implement, and evaluate data analysis algorithms for the distributed computing platforms MapReduce/Hadoop and Spark. In the latter, we explore techniques such as locality sensitive hashing, Bloom filters, and data stream mining. They are the foundation of modern data analysis in companies such as Google, Facebook, and Netflix.

Reinforcement learning refers to the situation in which you want to model your environment, but you don’t have a data set for training. Instead you learn by interacting with your environment. We’ll learn algorithms that, e.g., teach themselves how to play chess by simply playing the game (against another copy of themselves) millions of times! They have applications in autonomous systems, robotics, operations research, responsive website design, stock trading, etc.

A major component of the course is a quarter long project in which students build a prototype system for solving a real-world data analysis problem. Examples of past student projects include movie recommendation systems, text analysis to predict stock movements, a reinforcement learning system for stock trading, diagnosing eye disease from retina images, adding components to Spark’s machine learning library, building a system to play the game pong using reinforcement learning, and a deep learning system for lip reading.

**Topics **(tentative list)

- MapReduce framework
- Designing and analyzing MapReduce algorithms
- Spark framework
- Spark machine learning library (MLib)
- Locality sensitive hashing for finding similar items
- Data stream mining
- Finite Markov decision processes
- Reinforcement learning algorithms: Sarsa, Q-learning.
- Recommendation systems
- Other advanced data analysis/machine learning topics based on student interest

**Evaluation**

- Weekly Readings, Programming and Theory Assignments, Class Participation: 30%
- Three Quizzes: 35%
- Project: 35%

**Primary Textbook**

- Mining of Massive Datasets by Rajaraman, Leskovec, and Ullman; available free online.

MPCS 50103 Math for Computer Science

MPCS 55001 Algorithms

MPCS 51042 Python Programming (or Programming core requirement with prior knowledge of Python)

MPCS 53110 Foundations of Computational Data Analysis

MPCS 53111 Machine Learning

In all the above courses a grade of B+ or above is required.

The course requires mathematical, algorithmic, and programming maturity. Students are expected to know the following:

Programming in Python: use of lists, dictionaries, conditionals, classes, and reading from and writing to files.

Data structures: such as trees, graphs, and hash tables.

Basic multivariate calculus: including differentiation, integration, and finding maxima and minima.

Basic linear algebra: vectors, matrices, and matrix multiplication.

Further, students should be prepared to learning new libraries, languages (e.g., Scala), and programming paradigms.

This course requires competency in Unix and Linux. Please plan to attend the MPCS Unix Bootcamp (https://masters.cs.uchicago.edu/page/mpcs-unix-bootcamp) or take the online MPCS Unix Bootcamp Course on Canvas.

This class is scheduled at a time that conflicts with these other classes:

- MPCS 52011-2 -- Introduction to Computer Systems
- MPCS 50103-1 -- Mathematics for Computer Science: Discrete Mathematics
- MPCS 51240-2 -- Product Management
- MPCS 56520-1 -- Advanced Security Topics
- MPCS 52555-1 -- Backends for Applications
- MPCS 56511-1 -- Introduction to Computer Security

Masters Program in Computer Science
Bx/MS in Computer Science (Option 2: Professionally-oriented - CS Majors)
Bx/MS in Computer Science (Option 3: Profesionally-oriented - Non-CS Majors)