Multiversion concurrency control for the generalized search tree

Walter Binder, Adina Mosincat, Samuel Spycher, Ion Constantinescu, Boi Faltings
2009 Concurrency and Computation  
Many read-intensive systems where fast access to data is more important than the rate at which data can change make use of multidimensional index structures, like the generalized search tree (GiST). Although in these systems the indexed data are rarely updated and read access is highly concurrent, the existing concurrency control mechanisms for multidimensional index structures are based on locking techniques, which cause significant overhead. In this article we present the multiversion-GiST
more » ... GiST), an inmemory mechanism that extends the GiST with multiversion concurrency control. The MVGiST enables lock-free read access and ensures a consistent view of the index structure throughout a reader's series of queries, by creating lightweight, read-only versions of the GiST that share unchanging nodes among themselves. An example of a system with high read to write ratio, where providing wait-free queries is of utmost importance, is a large-scale directory that indexes web services according to their input and output parameters. A performance evaluation shows that for low update rates, the MVGiST significantly improves scalability w.r.t. the number of concurrent read accesses when compared with a traditional, locking-based concurrency control mechanism. We propose a technique to control memory consumption and confirm through our evaluation that the MVGiST efficiently manages memory. In order to limit the drawback of an increased memory consumption, we use the common practice of timeouts to control memory consumption. Whereas in the majority of information systems, query and connection timeouts are used for performance and security reasons, our solution benefits from timeouts also in terms of memory consumption control. Our implementation of MVCC provides for a fixed snapshot of the index across multiple read operations. The reader is essentially free to request a new snapshot at any time if available, or retain the same snapshot for as many read operations as she/he wishes up to a specified timeout constraint. Second, the index structure we chose to implement our MVCC design on is the GiST [8]. The GiST is a balanced tree, which contains algorithms for navigating as well as modifying the tree structure. The tree stores keys and records references in its leaf nodes, and the inner nodes contain predicates and references to their child nodes. These predicates evaluate true for any key in their child nodes. This hierarchy of predicates is essentially what is common to all tree-based index structures. The GiST itself is, however, not a fully implemented search tree. It is a generic structure that allows its user to define the data types to be stored in the tree, as well as the query predicates with which the data tuples can be inserted and retrieved. The advantage of this is that the GiST provides a 'template' index structure for most of the tree-based access methods, making it easier to integrate these index structures into databases. Notably, the GiST has been integrated into PostgreSQL [12] and is used e.g. in the PostGIS project [13], a spatially enabled database for geographic information systems. In this article we present the Multiversion-GiST (MVGiST), a concurrent index structure based on MVCC and the GiST. The features that the MVGiST inherits from the GiST, flexibility and query capabilities, are successfully combined with the high read concurrency offered by MVCC. Moreover, the MVGiST presents the reader with an unchanging view of the data allowing for consistency across multiple queries. This article includes a thorough evaluation of the MVGiST, which we implemented in Java because our use-case is a Java-based system and we take advantage of Java features such as automated memory management and multithreading. We show the efficiency of our MVGiST implementation compared with a locking-based technique, as well as its relatively moderate memory consumption. The MVGiST may be of interest to a great variety of applications that are based on multidimensional index structures, especially those that have a high read/write ratio and those that would benefit from consistency across multiple queries. An example use-case, where we recently applied the MVGiST, is a directory indexing web service advertisements in a way that enables efficient, automated service composition. In our previous work [14-16] we have introduced the idea of the MVGiST as a possible concurrency control technique for large-scale service directories. We have continued that work, and, in this article, we give a complete and comprehensive presentation of the MVGiST, we propose a concrete mechanism to control memory consumption, we show how the MVGiST can be integrated into a real system, and we evaluate the performance of our implementation, covering concurrency issues at the implementation level. The remainder of this article is structured as follows: Section 2 summarizes the features of the GiST. Section 3 presents the MVGiST, introducing the design principles and also discussing implementation issues. In Section 4 we consider a directory of web service advertisements as a use-case, where we successfully applied the MVGiST. In Section 5 we evaluate the performance and scalability of the MVGiST, comparing it with the locking-based concurrency control scheme MVCC FOR THE GIST Supported operations and synchronization issues There are three general operations that need to be distinguished for the MVGiST: • Read session: One or more read queries, possible timeout. • Write operation: Batch of inserts and deletes to be completed on the write tree. • Read tree creation: Creates a new read tree, does not split write batches. Relevant objects include those that are referenced by non-static fields or by array elements. Objects referenced by static fields are not considered.
doi:10.1002/cpe.1387 fatcat:vyqosebp5bherehnxsgidh4dya