Software-Hardware Cooperative Memory Disambiguation

R. Huang, A. Garg, M. Huang
The Twelfth International Symposium on High-Performance Computer Architecture, 2006.  
In high-end processors, increasing the number of in-flight instructions can improve performance by overlapping useful processing with long-latency accesses to the main memory. Buffering these instructions requires a tremendous amount of microarchitectural resources. Unfortunately, large structures negatively impact processor clock speed and energy efficiency. Thus, innovations in effective and efficient utilization of these resources are needed. In this paper, we target the load-store queue, a
more » ... ynamic memory disambiguation logic that is among the least scalable structures in a modern microprocessor. We propose to use software assistance to identify load instructions that are guaranteed not to overlap with earlier pending stores and prevent them from competing for the resources in the load-store queue. We show that the design is practical, requiring off-line analyses and minimum architectural support. It is also very effective, allowing more than 40% of loads to bypass the load-store queue for floating-point applications. This reduces resource pressure and can lead to significant performance improvements.
doi:10.1109/hpca.2006.1598133 dblp:conf/hpca/HuangGH06 fatcat:cdavjenw2fcvxkgh6uy3w4bvae