Systems-oriented multi-dimensional indexes: SP-GiST - The case for space-partitioning trees
Date and Time of the talk: April 15 2021, 9:30 AM EDT
Information of the Speaker
Walid G. Aref, Purdue University and Alexandria University-Egypt
Walid G. Aref got his Ph.D. from UMD in 1993. His research interests are in extending the functionality of database systems in support of emerging applications, e.g., spatial, spatio- temporal, graph, biological, and sensor databases. He is also interested in query processing, indexing, data streaming, and geographic information systems (GIS). Walid’s research has been supported by the National Science Foundation, the National Institute of Health, Purdue Research Foundation, CERIAS, Panasonic, and Microsoft Corp. In 2001, he received the CAREER Award from the National Science Foundation and in 2004, he received a Purdue University Faculty Scholar award. Walid is a member of Purdue’s CERIAS. He is the Editor-in-Chief of the ACM Transactions of Spatial Algorithms and Systems (ACM TSAS), an editorial board member of the Journal of Spatial Information Science (JOSIS), and has served as an editor of the VLDB Journal and the ACM Transactions of Database Systems (ACM TODS). Walid has won several best paper awards including the 2016 VLDB ten-year best paper award. He is a Fellow of the IEEE, and a member of the ACM. Between 2011 and 2014, Walid has served as the chair of the ACM Special Interest Group on Spatial Information (SIGSPATIAL).
Abstract
Realizing an index inside a database engine is a nontrivial task. It can take many programmer months to realize one index inside the DBMS. In addition to the need to realize the logic of the index operations inside the DBMS, one has to take care of other important issues for the introduced index including (1) Concurrency control, (2) Recovery in the presence of failures, transaction aborts, and system crashes, (3) Resource management, and (4) Memory, disk, and buffer management. Thus, from a software-engineering perspective, it is hard to cope with the many new ideas that either introduce new multi-dimensional indexes or that enhance the performance of existing ones. In this talk, I will highlight several software-engineering solutions to this problem. I will give an overview of generalized search trees for multi-dimensional data, mainly Space-Partitioning Generalized Search Trees (SP-GiST) and Data-Partitioning Generalized Search Trees (GiST) that cover the classes of disk-based space-driven and data-driven multi-dimensional indexes, respectively. In the rest of the talk, I will focus on SP-GiST and demonstrate how SP-GiST can realize many variants of disk-based quad-trees and disk-based trie index structures. I will also demonstrate the performance potential of several instantiations of space-partitioning trees using SP-GiST inside of PostgreSQL, and when they can outperform highly optimization index structures, e.g., the B+-tree.