Location-aware cache management for many-core processors with deep cache hierarchy

Jongsoo Park, Richard M. Yoo, Daya S. Khudia, Christopher J. Hughes, Daehyun Kim
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
As cache hierarchies become deeper and the number of cores on a chip increases, managing caches becomes more important for performance and energy. However, current hardware cache management policies do not always adapt optimally to the applications behavior: e.g., caches may be polluted by data structures whose locality cannot be captured by the caches, and producer-consumer communication incurs multiple round trips of coherence messages per cache line transferred. We propose load and store
more » ... ructions that carry hints regarding into which cache(s) the accessed data should be placed. Our instructions allow software to convey locality information to the hardware, while incurring minimal hardware cost and not affecting correctness. Our instructions provide a 1.07× speedup and a 1.24× energy efficiency boost, on average, according to simulations on a 64-core system with private L1 and L2 caches. With a large shared L3 cache added, the benefits increase, providing 1.33× energy reduction on average.
doi:10.1145/2503210.2503224 dblp:conf/sc/ParkYKHK13 fatcat:yvtqvwtg3rbnbcfgdbamqq5dy4