Biswabandan Panda, Shankar Balachandran
2015 ACM Transactions on Architecture and Code Optimization (TACO)  
Aggressive prefetching improves system performance by hiding and tolerating off-chip memory latency. However, on a multicore system, prefetchers of different cores contend for shared resources and aggressive prefetching can degrade the overall system performance. The role of a prefetcher aggressiveness engine is to select appropriate aggressiveness levels for each prefetcher such that shared resource contention caused by prefetchers is reduced, thereby improving system performance.
more » ... art prefetcher aggressiveness engines monitor metrics such as prefetch accuracy, bandwidth consumption, and last-level cache pollution. They use carefully tuned thresholds for these metrics, and when the thresholds are crossed, they trigger aggressiveness control measures. These engines have three major shortcomings: (1) thresholds are dependent on the system configuration (cache size, DRAM scheduling policy, and cache replacement policy) and have to be tuned appropriately, (2) there is no single threshold that works well across all the workloads, and (3) thresholds are oblivious to the phase change of applications. To overcome these shortcomings, we propose CAFFEINE, a model-based approach that analyzes the effectiveness of a prefetcher and uses a metric called net utility to control the aggressiveness. Our metric provides net processor cycles saved because of prefetching by approximating the cycles saved across the memory subsystem, from last-level cache to DRAM. We evaluate CAFFEINE across a wide range of workloads and compare it with the state-of-theart prefetcher aggressiveness engine. Experimental results demonstrate that, on average (geomean), CAFFEINE achieves 9.5% (as much as 38.29%) and 11% (as much as 20.7%) better performance than the best-performing aggressiveness engine for four-core and eight-core systems, respectively. ACM Reference Format: Biswabandan Panda and Shankar Balachandran. 2015. CAFFEINE: A utility-driven prefetcher aggressiveness engine for multicores. the phase-change behavior of applications. In effect, these techniques fail to identify certain scenarios where the prefetcher-caused intercore interference is significant and loses opportunities for performance improvement. Our Goal: Our goal is to design a prefetcher aggressiveness engine, which can improve the system performance by (1) tolerating prefetcher-caused intercore interference as long as performance is improved and (2) minimizing the prefetcher-caused intercore interference that affects performance. Our Approach: In this work, we propose CAFFEINE, 2 a utility-driven prefetcher aggressiveness engine that is based on the buffet principle [Mahajan et al. 2008] , to "continue using more resources as long as the marginal cost can be driven lower than the marginal benefit." CAFFEINE advocates the application of the buffet principle in controlling the aggressiveness of prefetchers on a multicore system. CAFFEINE continues increasing the aggressiveness if such a decision is likely to improve overall system performance. We propose a metric called net utility (utility net ), which quantifies the net processor cycles saved by a prefetcher. We use this metric to divide the prefetchers into two groups: affecting and affected. Our technique throttles down the affecting prefetchers and throttles up the affected prefetchers if they are likely to improve the system performance. We make these throttling decisions without using any threshold. CAFFEINE uses CAFFEINATION when the prefetcher-caused interference is tolerable (we define in Section 3.1) and it uses DE-CAFFEINATION when the prefetcher-caused interference is intolerable. Both these techniques use utility net to make throttling decisions. Key Idea: Our idea is based on this observation: "different levels of prefetcher aggressiveness provide different net utilities." At a given instant of time, CAFFEINE tries to maximize the utility net of an entire prefetching unit 3 of a multicore system. We make the following contributions: 1. We propose a metric called utility net to measure the utility of hardware prefetchers. utility net indicates the net processor cycles saved because of prefetching at a given aggressiveness level (Section 3.1). 2. We design a model-based utility-driven prefetcher aggressiveness engine for multicore systems, called CAFFEINE, which uses utility net (Section 3.2). 3. We evaluate CAFFEINE on four-and eight-core systems. We show the effectiveness of CAFFEINE by comparing it with the HPAC [Ebrahimi et al. 2009 ]. For four-and eight-core systems, compared to HPAC, CAFFEINE improves performance (geomean of harmonic speedups) by 9.5% and 11% across 100 and 64 workloads, respectively (Section 6).
doi:10.1145/2806891 fatcat:fzcf6ngcpfa4jktirp2eac5qua