Advanced Databases

Title Advanced Databases (53003)
Quarter Summer 2015
Instructor TBD
Website

Syllabus

NOTE: This is a tentative syllabus. The MPCS is still working with Dr.Malik on preparing a final syllabus for the course.

Course Contents
The objective of this course will be to (i) expand the knowledge by covering new topics that represent the state-of-the-art in database management systems and distributed systems, and (ii) to build upon foundations developed in MPCS 53001 - Databases by covering topics in greater depth. The following is a list of topics that shall form the basis of course content:

 

  • Distributed Databases

    • Distributed database design; query processing.

    • Parallel databases.

  • Distributed Transactions

    • Two phase commit protocol and its applications.

    • Concurrency control in distributed databases.

  • NoSQL databases

    • Architecture and variety of key-value (KV) stores and their use in applications.

    • Consistency-Availability-Partition (CAP) Theorem and Basically Available

    • Soft-state (BASE) in KV stores.

  • Data Warehousing

    • Materialized views and Online Analytical Processing (OLAP) querying in data

    • warehouses.

    • Modern systems for processing OLAP workloads, such as HadoopDB, Hive.

  • Alternative Data Storage & Models

    • Difference between row vs column stores; Performance comparisons.

    • Introduction to column and array stores (SciDB).

  • Graph Databases

    • Graph processing in databases. Introduction to tree databases, such as Extensible Markup Language (XML) databases.

    • Modern large-scale graph data management, e.g., Pregel, Tao, etc.

  • Object-Relational databases and Cloud databases.

    • Object-relational mappers (ORM), such as SQLAlchemy, PostgresORM.

    • Relational services in the cloud; Performance issues.

  • Geospatial databases

  • GIST indexes in PostGIS, Space filling curves, Geohashing.

     

 

Reading Assignments

There will be readings that introduce the topic covered in each class, and simple questions to test understanding. Readings shall be chosen from reference books and articles describing the state-of-the-art.

 

Weekly Assignments

The goal of assignments will be to help understand and evaluate a system/concept, and how that concept applies in a real-setting by reading articles.

Projects

The primary component of this course will be a group class project consisting of 2-3 students. Students shall be expected to organize into groups and choose to implement a project that is (1) relevant to the materials discussed in class and (2) requires a significant programming effort from all team members.

 

For instance, being able to compare performance of different DBMSs and different storage and access techniques is vital for the database community, and requires sufficient understanding of use of under lying resources such as cpu, memory, and disk load. Some projects that will help students understand the tradeoffs are:

  • The best database for graph data: comparing a graph database, KV stores, and a relational database for storing graphs.

  • An implementation of basic array processing in Hadoop

  • Developing a benchmark for NoSQL databases using real-world data, such as that from Wikimedia.

  • Investigating performance and scalability characteristics of Amazon RDS. Also since RDS services run in a virtualized environment, studying the “stability” and “isolation” of the performance offered is interesting.

  • Designing a hybrid OLTP/OLAP database system.

  • A performance dashboard for a distributed relational/KV database.

 

Grading

  • Reading Assignment and Class Participation: 10%

  • Weekly Assignments: 60%

  • Programming Project: 30%

 

Reference Books

Readings will be assigned from the following textbooks:

  • Principles of Distributed Database Systems, by M. Tamer Ozsu and Patrick Valduriez. Prentice Hall, 1999, ISBN 0-13-659707-6.

  • Readings in Database Systems (The Red Book). 4th ed., by Joseph Heller- stein and Michael Stonebraker, MIT Press, 2005. ISBN: 9780262693141.

  • Database Management Systems (Third Edition) by R. Ramakrishnan and J. Gehrke, published by McGraw-Hill.

  • Concurrency Control and Recovery in Database Systems by P.A. Bernstein, V. Hadzilacos and N. Goodman, published by Addison-Wesley, 1987.

Prerequisites (Courses)

Prerequisites (Other)

MPCS 53001 - Databases

Satisfies

Time

TBD

Location

TBD