Optimistic Algorithms for Partial Database Replication
Lecture Notes in Computer Science
In this paper, we study the problem of partial database replication. Studying partial replication is motivated by three reasons: First, database sites might not have enough memory or disk resources to fully replicate data. Second, when access locality is observed, full replication is pointless. Third, full replication protocols have limited scalability. Numerous previous works have investigated database replication, however, most of them focus on full replication. In this paper we are
... in genuine partial replication protocols, which require replicas to permanently store only information about data items they replicate. We define two properties to characterize partial replication. The first one, Quasi-Genuine Partial Replication, captures the above idea; the second one, Non-Trivial Certification, rules out solutions that would abort transactions unnecessarily in an attempt to ensure the first property. We also present two algorithms that extend the Database State Machine  to partial replication and guarantee both Quasi-Genuine Partial Replication and Non-Trivial Certification. Our algorithms compare favorably to existing solutions both in terms of number of messages and communication steps. Database replication protocols based on group communication have recently received a lot of attention [1, 2, 8, 9, 10, 12, 15, 17] . The main reason for this stems from the fact that group communication primitives offer adequate properties, namely agreement on the messages delivered and on their order, to implement synchronous database replication. Most of the complexity involved in synchronizing database replicas is handled by the group communication layer. Previous work on group-communication-based database replication has focused mainly on full replication. However, full replication might not always be adequate. First, sites might not have enough disk or memory resources to fully replicate the database. Second, when access locality is observed, full replication is pointless. Third, full replication provides limited scalability since every update transaction should be executed by each replica. In this paper, we extend the Database State Machine (DBSM) , a group-communication-based database replication technique, to partial replication. The DBSM is based on the deferred update replication model  . Transactions execute locally on one database site and their execution does not cause any interaction with other sites. Read-only transactions commit locally only; update transactions are atomically broadcast to all database sites at commit time for certification. The certification test ensures one-copy serializability: the execution of concurrent transactions on different replicas is equivalent to a serial execution on a single replica  . In order to execute the certification test, every database site keeps the writesets of committed transactions.