Tashkent

Sameh Elnikety, Steven Dropsho, Fernando Pedone
2006 Proceedings of the 2006 EuroSys conference on - EuroSys '06  
In stand-alone databases, the two functions of ordering the transaction commits and making the effects of transactions durable are generally performed in one action, namely in the writing of the commit record to disk. In replicated database systems where all replicas agree on the commit order of update transactions, these two functions are naturally separated; specifically, the replication middleware determines the global commit order, while database replicas make transactions durable. The
more » ... ibution of this paper is to demonstrate that the traditional separation of commit ordering from durability in replicated designs forces update transactions to be made durable serially to disk, a potentially significant scalability bottleneck. Two solutions are possible: (1) keep durability in the database and pass the global commit order from the replication middleware to the database, or (2) move durability from the database to the replication middleware. We show that regardless of the method, uniting ordering and durability greatly improves system scalability. We implement two example scalable replicated database systems called Tashkent-MW and Tashkent-API to show the benefits of joining global commit order and durability. Tashkent-MW is a pure middleware solution that combines ordering and durability in the middleware and treats an unmodified database as a black box. Tashkent-MW represents a high-performance replication solution suitable for closedsource, off-the-shelf standalone databases. In Tashkent-API, we modify the open source PostgreSQL database API so the middleware can specify the commit order, combining ordering and durability inside the database. We compare both Tashkent systems to a similar replicated system, called Base, in which ordering and durability remain separated. Under high update transaction loads at 15 replicas, we show both Tashkent systems greatly improve scalability and outperform Base by factors of 5 and 3 times, respectively, in throughput with lower response times. We implement instances of both approaches, called Tashkent-MW and Tashkent-API, respectively, and compare them to an instance of a traditional replication system, called Base, where
doi:10.1145/1217935.1217947 dblp:conf/eurosys/ElniketyDP06 fatcat:seis6reftvbhdibpbofaquzgga