Databases on Modern Hardware: How to Stop Underutilization and Love Multicores (Synthesis Lectures on Data Management)
Anastasia Ailamaki, Erietta Liarou, Pinar Tözün
Data management systems enable various influential applications from high-performance online services (e.g., social networks like Twitter and Facebook or financial markets) to big data analytics (e.g., scientific exploration, sensor networks, business intelligence). As a result, data management systems have been one of the main drivers for innovations in the database and computer architecture communities for several decades. Recent hardware trends require software to take advantage of the abundant parallelism existing in modern and future hardware. The traditional design of the data management systems, however, faces inherent scalability problems due to its tightly coupled components. In addition, it cannot exploit the full capability of the aggressive micro-architectural features of modern processors. As a result, today's most commonly used server types remain largely underutilized leading to a huge waste of hardware resources and energy.
In this book, we shed light on the challenges present while running DBMS on modern multicore hardware. We divide the material into two dimensions of scalability: implicit/vertical and explicit/horizontal.
The first part of the book focuses on the vertical dimension: it describes the instruction- and data-level parallelism opportunities in a core coming from the hardware and software side. In addition, it examines the sources of under-utilization in a modern processor and presents insights and hardware/software techniques to better exploit the microarchitectural resources of a processor by improving cache locality at the right level of the memory hierarchy.
The second part focuses on the horizontal dimension, i.e., scalability bottlenecks of database applications at the level of multicore and multisocket multicore architectures. It first presents a systematic way of eliminating such bottlenecks in online transaction processing workloads, which is based on minimizing unbounded communication, and shows several techniques that minimize bottlenecks in major components of database management systems. Then, it demonstrates the data and work sharing opportunities for analytical workloads, and reviews advanced scheduling mechanisms that are aware of nonuniform memory accesses and alleviate bandwidth saturation.