MPCS 53014 Big Data Application Architecture (Autumn 2022)

Section 1
Instructor(s) Spertus, Michael (spertus)
Location JCL 390
Meeting Times Thursday 5:30pm - 8:30pm
Fulfills Elective Specialization - Data Analytics (DA-2)


The purpose of this class is to learn how to build applications at scale, by providing you with the techniques and tools capable of providing subsecond response times to millions of users interacting with petabytes of data.

In this course, we will cover both the theory and practice of building Big Data application. We will not only learn how to use technologies such as HDFS, MapReduce, Spark, Kafka, Hive, Thrift, HBase, Zookeeper, columnar stores, etc., but also understand why Big Data applications employ such a diverse array of technologies and where each one of them fits.

We will demonstrate the practice of Big Data application architecture by implementing a running Big Data web application for exploring the relationship between weather and flight performance utilizing all of the weather and flight delay information in the United States over the last decade to explore the relationship between weather and flight perforrmance.

Displaying image.png

To develop a sound understanding of the theory of Big Data, we will learn about important formulations of Big Data application architectures, such as Nathan Marz' lambda architecture, proper use of normalized and denormalized data stores within large-scale web applications, application of the CAP theorem, etc.  We will also continuously keep in mind important additional topics that invariably arise in real world applications of Big Data, such as budgeting, compliance, etc..

Students are required to bring a laptop to class every week.

Course Contents

  • Overview of Big Data
  • Lambda architecture
  • Data model/storage
  • Batch layer
  • Serving layer
  • Speed layer
  • Technologies including Hadoop/Spark/Hive/HBase and other NoSQL databases/Thrift/Zookeeper etc.
  • Scraping and cleaning data

There will be weekly homework assignments on particular topics. At the end of the course, each student will do a Big Data web application on a topic of their choice that interests them. Past projects have included analyzing Divvy bike rental trends, looking at the effect of weather on Chicago crime data, protein folding, data mining wikipedia, and more. In addition to being cool, discovering that one knows everything they need to develop a complete Big Data web application is a great experience.

Relationship to other MPCS Big Data courses
This course focuses on the topic of architecting large scale Big Data applications. However, it only lightly touches on other Big Data-relevant topics like managing infrastructure in the public cloud or Big Data machine learning algorithms (we do discuss how to adapt traditional analytics queries to Big Data environments). While the course is complete on its own and will leave you in a position where you are comfortable building enterprise-grade Big Data web applications, regardless of what other courses you take, it also complements other Big Data courses. One useful way to view it would be that if you develop a powerful new Big Data analytic using machine learning techniques from and ML course, this course will teach you how to architect and implement a Big Data web application that can be used by millions of users on petabytes of data that leverages the new analytic, which can then be deployed in the public cloud using PaaS and IaaS techniques taught in a Cloud Computing class.

Course Prerequisites

Core programming requirement

Other Prerequisites

Very basic programming skills in Java. Basic linux IT skills.

This course requires competency in Unix and Linux. Please plan to attend the MPCS Unix Bootcamp ( or review the UChicago CS Student Resource Guide here:

Overlapping Classes

This class is scheduled at a time that conflicts with these other classes:

  • MPCS 50103-1 -- Mathematics for Computer Science: Discrete Mathematics
  • MPCS 51083-1 -- Cloud Computing
  • MPCS 51082-1 -- Introduction to Unix Systems

Eligible Programs

Masters Program in Computer Science MS in Computational Analysis in Public Policy (Year 2) MA in Computational Social Science (Year 2) Bx/MS in Computer Science (Option 2: Professionally-oriented - CS Majors) Bx/MS in Computer Science (Option 3: Profesionally-oriented - Non-CS Majors)