Big Data

Title Big Data (53013)
Quarter Autumn 2015
Instructor Michael Spertus (spertus@cs.uchicago.edu)
Website

Syllabus

Course Description
In this course, we will cover both the theory and practice of Big Data. We will use technologies such as HDFS, Kafka, Storm, Cassandra, Pig, Thrift, MapReduce, and more to implement a running Big Data web application correlating all of the weather and flight delay information in the United States over the last decade to explore the relationship between weather and flight perforrmance.
Displaying image.png

To develop a sound understanding of the theory of Big Data, we will use Marz and Warren's Big Data textbook providing a conceptual architecture for Big Data systems.  We will also cover important additional topics that invariably arise in real world applications of Big Data, such as budgeting, compliance, etc..

Students are required to bring a laptop to class every week.

Course Contents

  • Overview of Big Data
  • Lambda architecture
  • Data model/storage
  • Batch layer
  • Serving layer
  • Speed layer
  • Tools including Hadoop/Pig/NoSQL databases, etc.
  • Scraping and cleaning data

Coursework
There will be weekly homework assignments on particular topics. At the end of the course, each student will do a Big Data web application on a topic of their choice that interests them. Past projects have included analyzing Divvy bike rental trends, looking at the effect of weather on Chicago crime data, protein folding, data mining wikipedia, and more. In addition to being cool, discovering that one knows everything they need to develop a complete Big Data web application is a great experience.

Course Textbook
All students should purchase and download a copy of Marz and Warren’s Big Data from http://www.manning.com/marz/.

Prerequisites (Courses)

Core programming requirement

Prerequisites (Other)

Very basic programming skills in Java. Basic linux IT skills.

Satisfies

Specialization - Data Analytics

Time

Tuesdays, 5:30-8:30

Location

Ryerson 276