ASR: Adaptive Selective Replication for CMP Caches

Bradford Beckmann, Michael Marty, David Wood
2006 Microarchitecture (MICRO), Proceedings of the Annual International Symposium on  
The large working sets of commercial and scientific workloads stress the L2 caches of Chip Multiprocessors (CMPs). Some CMPs use a shared L2 cache to maximize the on-chip cache capacity and minimize off-chip misses. Others use private L2 caches, replicating data to limit the delay due to global wires and minimize cache access time. Recent hybrid proposals use selective replication to balance latency and capacity, but their static replication rules result in performance degradation for some
more » ... nations of workloads and system configurations. This paper proposes Adaptive Selective Replication (ASR), a mechanism that dynamically monitors workload behavior to control replication. ASR replicates cache blocks only when it estimates the benefit of replication (lower L2 hit latency) exceeds the cost (more L2 misses). Full-system simulations of 8-processor CMPs show that ASR provides robust performance: improving performance by as much as 29% versus shared caches, 19% versus private caches, and 12% versus CMP-NuRapid [9] and Victim Replication [41]. Furthermore, while ASR does not improve the performance of all workloads, it provides performance stability by always performing at least comparably to the best alternative including Cooperative Caching [8]. • We introduce Selective Probabilistic Replication (SPR), a simple replication mechanism that exploits the fact that the most frequently requested L2 blocks are also the most frequently evicted L1 blocks. By using probabilistic filtering, SPR requires significantly less hardware than CMP-NuRapid and Cooperative Caching, and equivalent hardware to Victim Replication.
doi:10.1109/micro.2006.10 dblp:conf/micro/BeckmannMW06 fatcat:ybh52lm5ajenvezppy2f2rbjpy