Characterizing multi-threaded applications for designing sharing-aware last-level cache replacement policies
2013 IEEE International Symposium on Workload Characterization (IISWC)
Recent years have seen a large volume of proposals on managing the shared last-level cache (LLC) of chipmultiprocessors (CMPs). However, most of these proposals primarily focus on reducing the amount of destructive interference between competing independent threads of multi-programmed workloads. While very few of these studies evaluate the proposed policies on shared memory multi-threaded applications, they do not improve constructive cross-thread sharing of data in the LLC. In this paper, we
... aracterize a set of multi-threaded applications drawn from the PARSEC, SPEC OMP, and SPLASH-2 suites with the goal of introducing sharing-awareness in LLC replacement policies. We motivate our characterization study by quantifying the potential contributions of the shared and the private blocks toward the overall volume of the LLC hits in these applications and show that the shared blocks are more important than the private blocks. Next, we characterize the amount of sharing-awareness enjoyed by recent proposals compared to the optimal policy. We design and evaluate a generic oracle that can be used in conjunction with any existing policy to quantify the potential improvement that can come from introducing sharingawareness. The oracle analysis shows that introducing sharingawareness reduces the number of LLC misses incurred by the least-recently-used (LRU) policy by 6% and 10% on average for a 4MB and 8MB LLC respectively. A realistic implementation of this oracle requires the LLC controller to have the capability to accurately predict, at the time a block is filled into the LLC, whether the block will be shared during its residency in the LLC. We explore the feasibility of designing such a predictor based on the address of the fill and the program counter of the instruction that triggers the fill. Our sharing behavior predictability study of two history-based fill-time predictors that use block addresses and program counters concludes that achieving acceptable levels of accuracy with such predictors will require other architectural and/or high-level program semantic features that have strong correlations with active sharing phases of the LLC blocks.