Checkpointed early load retirement
11th International Symposium on High-Performance Computer Architecture
Long-latency loads are critical in today's processors due to the ever-increasing speed gap with memory. Not only do these loads block the execution of dependent instructions, they also prevent other instructions from moving through the in-order reorder buffer (ROB) and retire. As a result, the processor quickly fills up with uncommitted instructions, and computation ultimately stalls. To attack this problem, we propose checkpointed early load retirement, a mechanism that combines register
... ointing and back-end-i.e., at retirement-load-value prediction. When a long-latency load hits the ROB head unresolved, the processor enters Clear mode by (1) taking a Checkpoint of the architectural registers, (2) supplying a Load-value prediction to consumers, and (3) EARly-retiring the long-latency load. This unclogs the ROB, thereby "clearing the way" for subsequent instructions to retire, and also allowing instructions dependent on the long-latency load to execute sooner. When the actual value returns from memory, it is compared against the prediction. A misprediction causes the processor to roll back to the checkpoint, discarding all subsequent computation. The benefits of executing in Clear mode come from providing early forward progress on correct predictions, and from warming up caches and other structures on wrong predictions. Our evaluation shows that a Clear implementation with support for four checkpoints yields an average speedup of 1.12 for both eleven integer and eight floating-point applications (1.27 and 1.19 for five integer and five floatingpoint memory-bound applications, respectively), relative to a contemporary out-of-order processor with an aggressive hardware prefetcher.