MPCS 53013 Big Data (Autumn 2017)

Section 1
Instructor(s) Spertus, Michael (spertus)
Location Ryerson 251
Meeting Times Monday 5:30pm - 8:30pm
Fulfills

Syllabus

Course Description

In this course, we will cover both the theory and practice of Big Data. We will use technologies such as HDFS, Kafka, Storm, Cassandra, Pig, Thrift, MapReduce, and more to implement a running Big Data web application correlating all of the weather and flight delay information in the United States over the last decade to explore the relationship between weather and flight perforrmance.

 

To develop a sound understanding of the theory of Big Data, we will use Marz and Warren's Big Data textbook providing a conceptual architecture for Big Data systems.  We will also cover important additional topics that invariably arise in real world applications of Big Data, such as budgeting, compliance, etc..

 

Students are required to bring a laptop to class every week.

 

Course Contents

    Overview of Big Data

    Lambda architecture

    Data model/storage

    Batch layer

    Serving layer

    Speed layer

    Tools including Hadoop/Pig/NoSQL databases, etc.

    Scraping and cleaning data

 

Coursework

There will be weekly homework assignments on particular topics. At the end of the course, each student will do a Big Data web application on a topic of their choice that interests them. Past projects have included analyzing Divvy bike rental trends, looking at the effect of weather on Chicago crime data, protein folding, data mining wikipedia, and more. In addition to being cool, discovering that one knows everything they need to develop a complete Big Data web application is a great experience.

 

Course Textbook

All students should purchase and download a copy of Marz and Warren’s Big Data from http://www.manning.com/marz/

Course Prerequisites

Core Programming

Other Prerequisites

Very basic programming skills in Java. Basic linux IT skills.

Overlapping Classes

This class is scheduled at a time that conflicts with these other classes:

  • MPCS 51040-1 -- C Programming
  • MPCS 52011-1 -- Introduction to Computer Systems
  • MPCS 51100-1 -- Advanced Programming
  • MPCS 51042-2 -- Python Programming
  • MPCS 51400-1 -- Functional Programming