NOTE: This is a tentative syllabus. The MPCS is still working with Dr.Malik on preparing a final syllabus for the course.
The objective of this course will be to (i) expand the knowledge by covering new topics that represent the state-of-the-art in database management systems and distributed systems, and (ii) to build upon foundations developed in MPCS 53001 - Databases by covering topics in greater depth. The following is a list of topics that shall form the basis of course content:
Distributed database design; query processing.
Two phase commit protocol and its applications.
Concurrency control in distributed databases.
Architecture and variety of key-value (KV) stores and their use in applications.
Consistency-Availability-Partition (CAP) Theorem and Basically Available
Soft-state (BASE) in KV stores.
Materialized views and Online Analytical Processing (OLAP) querying in data
Modern systems for processing OLAP workloads, such as HadoopDB, Hive.
Alternative Data Storage & Models
Difference between row vs column stores; Performance comparisons.
Introduction to column and array stores (SciDB).
Graph processing in databases. Introduction to tree databases, such as Extensible Markup Language (XML) databases.
Modern large-scale graph data management, e.g., Pregel, Tao, etc.
Object-Relational databases and Cloud databases.
Object-relational mappers (ORM), such as SQLAlchemy, PostgresORM.
Relational services in the cloud; Performance issues.
GIST indexes in PostGIS, Space filling curves, Geohashing.
There will be readings that introduce the topic covered in each class, and simple questions to test understanding. Readings shall be chosen from reference books and articles describing the state-of-the-art.
The goal of assignments will be to help understand and evaluate a system/concept, and how that concept applies in a real-setting by reading articles.
The primary component of this course will be a group class project consisting of 2-3 students. Students shall be expected to organize into groups and choose to implement a project that is (1) relevant to the materials discussed in class and (2) requires a significant programming effort from all team members.
For instance, being able to compare performance of different DBMSs and different storage and access techniques is vital for the database community, and requires sufficient understanding of use of under lying resources such as cpu, memory, and disk load. Some projects that will help students understand the tradeoffs are:
The best database for graph data: comparing a graph database, KV stores, and a relational database for storing graphs.
An implementation of basic array processing in Hadoop
Developing a benchmark for NoSQL databases using real-world data, such as that from Wikimedia.
Investigating performance and scalability characteristics of Amazon RDS. Also since RDS services run in a virtualized environment, studying the “stability” and “isolation” of the performance offered is interesting.
Designing a hybrid OLTP/OLAP database system.
A performance dashboard for a distributed relational/KV database.
Reading Assignment and Class Participation: 10%
Weekly Assignments: 60%
Programming Project: 30%
Readings will be assigned from the following textbooks:
Principles of Distributed Database Systems, by M. Tamer Ozsu and Patrick Valduriez. Prentice Hall, 1999, ISBN 0-13-659707-6.
Readings in Database Systems (The Red Book). 4th ed., by Joseph Heller- stein and Michael Stonebraker, MIT Press, 2005. ISBN: 9780262693141.
Database Management Systems (Third Edition) by R. Ramakrishnan and J. Gehrke, published by McGraw-Hill.
Concurrency Control and Recovery in Database Systems by P.A. Bernstein, V. Hadzilacos and N. Goodman, published by Addison-Wesley, 1987.
MPCS 53001 - Databases
This class is scheduled at a time that does not conflict with any other classes this quarter.