Automatic partitioning of database applications

Alvin Cheung, Samuel Madden, Owen Arden, Andrew C. Myers
2012 Proceedings of the VLDB Endowment  
Database-backed applications are nearly ubiquitous in our daily lives. Applications that make many small accesses to the database create two challenges for developers: increased latency and wasted resources from numerous network round trips. A well-known technique to improve transactional database application performance is to convert part of the application into stored procedures that are executed on the database server. Unfortunately, this conversion is often difficult. In this paper we
more » ... be Pyxis, a system that takes database-backed applications and automatically partitions their code into two pieces, one of which is executed on the application server and the other on the database server. Pyxis profiles the application and server loads, statically analyzes the code's dependencies, and produces a partitioning that minimizes the number of control transfers as well as the amount of data sent during each transfer. Our experiments using TPC-C and TPC-W show that Pyxis is able to generate partitions with up to 3× reduction in latency and 1.7× improvement in throughput when compared to a traditional non-partitioned implementation and has comparable performance to that of a custom stored procedure implementation. ployed on the application server and the other in the database server as stored procedures. The two programs communicate with each other via remote procedure calls (RPCs) to implement the semantics of the original application. In order to generate a partition, Pyxis first analyzes application source code using static analysis and then collects dynamic information such as runtime profile and machine loads. The collected profile data and results from the analysis are then used to formulate a linear program whose objective is to minimize, subject to a maximum CPU load, the overall latency due to network round trips between the application and database servers as well as the amount of data sent during each round trip. The solved linear program then yields a fine-grained, statementlevel partitioning of the application's source code. The partitioned code is split into two halves and executed on the application and database servers using the Pyxis runtime. The main benefit of our approach is that the developer does not need to manually decide which part of her program should be executed where. Pyxis identifies good candidate code blocks for conversion to stored procedures and automatically produces the two distinct pieces of code from the single application codebase. When the application is modified, Pyxis can automatically regenerate and redeploy this code. By periodically re-profiling their application, developers can generate new partitions as load on the server or application code changes. Furthermore, the system can switch between partitions as necessary by monitoring the current server load. Pyxis makes several contributions: 1. We present a formulation for automatically partitioning programs into stored procedures that minimize overall latency subject to CPU resource constraints. Our formulation leverages a combination of static and dynamic program analysis to construct a linear optimization problem whose solution is our desired partitioning. 2. We develop an execution model for partitioned applications where consistency of the distributed heap is maintained by automatically generating custom synchronization operations. 3. We implement a method for adapting to changes in real-time server load by dynamically switching between pre-generated partitions created using different resource constraints. 4. We evaluate our Pyxis implementation on two popular transaction processing benchmarks, TPC-C and TPC-W, and compare the performance of our partitions to the original program and versions using manually created stored procedures. Our results show Pyxis can automatically partition database programs to get the best of both worlds: when CPU resources are plentiful, Pyxis produces a partition with comparable performance to that of hand-coded stored procedures; when resources are limited, it produces a partition comparable to simple client-side queries. The rest of the paper is organized as follows. We start with an architectural overview of Pyxis in Sec. 2. We describe how Pyxis programs execute and synchronize data in Section Sec. 3. We present the optimization problem and describe how solutions are obtained in Sec. 4. Sec. 5 explains the generation of partitioned programs, and Sec. 6 describes the Pyxis runtime system. Sec. 7 shows our experimental results, followed by related work and conclusions in Sec. 8 and Sec. 9.
doi:10.14778/2350229.2350262 fatcat:2uhprxll7bhnxaatq64c7nvlw4