In this course, we will cover both the theory and practice of Big Data. We will use technologies such as HDFS, Kafka, Storm, Cassandra, Pig, Thrift, MapReduce, and more to implement a running Big Data web application correlating all of the weather and flight delay information in the United States over the last decade to explore the relationship between weather and flight perforrmance.
To develop a sound understanding of the theory of Big Data, we will use Marz and Warren's Big Data textbook providing a conceptual architecture for Big Data systems. We will also cover important additional topics that invariably arise in real world applications of Big Data, such as budgeting, compliance, etc..
Students are required to bring a laptop to class every week.
- Overview of Big Data
- Lambda architecture
- Data model/storage
- Batch layer
- Serving layer
- Speed layer
- Tools including Hadoop/Pig/NoSQL databases, etc.
- Scraping and cleaning data
There will be weekly homework assignments on particular topics. At the end of the course, each student will do a Big Data web application on a topic of their choice that interests them. Past projects have included analyzing Divvy bike rental trends, looking at the effect of weather on Chicago crime data, protein folding, data mining wikipedia, and more. In addition to being cool, discovering that one knows everything they need to develop a complete Big Data web application is a great experience.
All students should purchase and download a copy of Marz and Warren’s Big Data from http://www.manning.com/marz/.