Code layout optimizations for transaction processing workloads
SIGARCH Computer Architecture News
Commercial applications such as databases and Web servers constitute the most important market segment for high-performance servers. Among these applications, on-line transaction processing (OLTP) workloads provide a challenging set of requirements for system designs since they often exhibit inefficient executions dominated by a large memory stall component. This behavior arises from large instruction and data footprints and high communication miss rates. A number of recent studies have
... rized the behavior of commercial workloads and proposed architectural features to improve their performance. However, there has been little research on the impact of software and compiler-level optimizations for improving the behavior of such workloads. This paper provides a detailed study of profile-driven compiler optimizations to improve the code layout in commercial workloads with large instruction footprints. Our compiler algorithms are implemented in the context of Spike, an executable optimizer for the Alpha architecture. Our experiments use the Oracle commercial database engine running an OLTP workload, with results generated using both full system simulations and actual runs on Alpha multiprocessors. Our results show that code layout optimizations can provide a major improvement in the instruction cache behavior, providing a 55% to 65% reduction in the application misses for 64-128K caches. Our analysis shows that this improvement primarily arises from longer sequences of consecutively executed instructions and more reuse of cache lines before they are replaced. We also show that the majority of application instruction misses are caused by self-interference. However, code layout optimizations significantly reduce the amount of self-interference, thus elevating the relative importance of interference with operating system code. Finally, we show that better code layout can also provide substantial improvements in the behavior of other memory system components such as the instruction TLB and the unified second-level cache. The overall performance impact of our code layout optimizations is an improvement of 1.33 times in the execution time of our workload.