The course will cover the implementation aspects of data management systems using relational database engines as a starting point to cover the basic concepts of efficient data processing and then expanding those concepts to modern implementations in data centers and the cloud.
The goal of the course is to convey the fundamental aspects of efficient data management from a systems implementation perspective: storage, access, organization, indexing, consistency, concurrency, transactions, distribution, query compilation vs interpretation, data representations, etc. Using conventional relational engines as a starting point, the course will aim at providing an in depth coverage of the latest technologies used in data centers and the cloud to implement large scale data processing in various forms.
The course will first cover fundamental concepts in data management: storage, locality, query optimization, declarative interfaces, concurrency control and recovery, buffer managers, management of the memory hierarchy, presenting them in a system independent manner. The course will place an special emphasis on understating these basic principles as they are key to understanding what problems existing systems try to address. It will then proceed to explore their implementation in modern relational engines supporting SQL to then expand the range of systems used in the cloud: key value stores, geo-replication, query as a service, serverless, large scale analytics engines, etc.
The main source of information for the course will be articles and research papers describing the architecture of the systems discussed. The list of papers will be provided at the beginning of the course.
Performance assessment information (valid until the course unit is held again)