The foundational course in Databases (MPCS 53001) described how to model data and program a database in SQL. This advanced course will take you "behind the scene" allowing us to understand the architecture of a database system and the different data models it can support in depth. We will understand, analyze, and develop efficient algorithms for operating on the data models, and build-upon their implementation (main-memory/out-of-core; single/distributed) in modern databases. A primary goal of this course is to understand principles of data management (local or distributed) in abstraction, experiment with their implementation in relational database management systems, and appreciate the evolution of NoSQL and NewSQL databases.
The course will have two phases. In Phase 1 (first five weeks) we will understand principles of data management within an RDBMS. The Phase 2 (next five weeks) we will consider their use in distributed and NoSQL databases. The course will include weekly/bi-weekly assignments and a project. We will cover the following topics:
The scope/emphasis of each topic can be tailored based on student preference and available time.
- Introduction to Data Models
- Hierarchical, Network, Relational, ER, Semantic, OO, OR
- Introduction to Database Internals Architecture of a Database System
- Indexing, join algorithms, query optimization
- Storage management, Cost Estimation and Query Processing
- Database layout for analytical databases
- Difference between row vs column stores; Performance comparisons.
- Parallel and Distributed Databases
- Parallel database design
- Distributed database design
- ACID Transactions and Distributed Transactions
- Two phase commit protocol and its applications.
- Concurrency control in distributed databases.
- Recovery From Failures
- Logging (REDO, UNDO, REDO/UNDO logs)
- Eventual Consistency and NoSQL databases Consistency-Availability-Partition (CAP) Theorem and Basically Available
- Soft-state (BASE) in KV stores.
- Architecture and variety of key-value (KV) stores and their use in applications.
- Graph databases
- Graph processing in databases.
There will be readings that introduce the topic covered in each class, and simple questions to test understanding. Readings shall be chosen from reference books and articles describing the state-of-the-art.
The goal of assignments will be to help understand and evaluate a system/concept, and how that concept applies in a real-setting by reading articles.
Depending upon the strength of the enrollment and support TAs, students will choose from a class-provided project or work on a team project. Class-provided project shall be done on an individual basis. A team project can be in a group of two students. Students shall be expected to form their teams and choose to implement a project that is (1) relevant to the materials discussed in class and (2) requires a significant programming effort from both team members. The first class will provide examples of projects that satisfy the criteria.
- Reading Assignment and Class Participation: 10%
- Weekly Assignments: 60%
- Programming Project: 30%
Readings will be assigned from the following textbooks:
- Principles of Distributed Database Systems, by M. Tamer Ozsu and Patrick Valduriez. Prentice Hall, 1999, ISBN 0-13-659707-6.
- Readings in Database Systems (The Red Book). 4th ed., by Joseph Heller- stein and Michael Stonebraker, MIT Press, 2005. ISBN: 9780262693141.
- Database Management Systems (Third Edition) by R. Ramakrishnan and J. Gehrke, published by McGraw-Hill.
- Concurrency Control and Recovery in Database Systems by P. Bernstein, V. Hadzilacos and N. Goodman, published by Addison-Wesley, 1987.