Big Data

Title Big Data (53013)
Quarter Autumn 2017
Instructor Michael Spertus (


Course Description

In this course, we will cover both the theory and practice of Big Data. We will use technologies such as HDFS, Kafka, Storm, Cassandra, Pig, Thrift, MapReduce, and more to implement a running Big Data web application correlating all of the weather and flight delay information in the United States over the last decade to explore the relationship between weather and flight perforrmance.


To develop a sound understanding of the theory of Big Data, we will use Marz and Warren's Big Data textbook providing a conceptual architecture for Big Data systems.  We will also cover important additional topics that invariably arise in real world applications of Big Data, such as budgeting, compliance, etc..


Students are required to bring a laptop to class every week.


Course Contents

    Overview of Big Data

    Lambda architecture

    Data model/storage

    Batch layer

    Serving layer

    Speed layer

    Tools including Hadoop/Pig/NoSQL databases, etc.

    Scraping and cleaning data



There will be weekly homework assignments on particular topics. At the end of the course, each student will do a Big Data web application on a topic of their choice that interests them. Past projects have included analyzing Divvy bike rental trends, looking at the effect of weather on Chicago crime data, protein folding, data mining wikipedia, and more. In addition to being cool, discovering that one knows everything they need to develop a complete Big Data web application is a great experience.


Course Textbook

All students should purchase and download a copy of Marz and Warren’s Big Data from

Prerequisites (Courses)

Core Programming

Prerequisites (Other)

Very basic programming skills in Java. Basic linux IT skills.


Data Analytics Specialization


Monday 5:30-8:30 pm


Ryerson 251