Performance Limits of Trace Caches

Matt Postiff, Gary S. Tyson, Trevor N. Mudge
1999 Journal of Instruction-Level Parallelism  
A growing number of studies have explored the use of trace caches as a mechanism to increase instruction fetch bandwidth. The trace cache is a memory structure that stores statically non-contiguous but dynamically adjacent instructions in contiguous memory locations. When coupled with an aggressive trace or multiple branch predictor, it can fetch multiple basic blocks per cycle using a single-ported cache structure. This paper compares trace cache performance to the theoretical limit of a
more » ... block fetch mechanism. The three-block fetch mechanism is modeled by an idealized 3-ported instruction cache with a zero-latency alignment network. Several new metrics are defined to formalize analysis of the trace cache. These include fragmentation, duplication, indexability, and efficiency metrics. We show that performance is more limited by branch mispredictions than ability to fetch multiple blocks per cycle. As branch prediction improves, high duplication and the resulting low efficiency are shown to be among the reasons that the trace cache does not reach its upper bound. Based on the shortcomings of the trace cache shown in this paper, we identify some potential future research areas.
dblp:journals/jilp/PostiffTM99 fatcat:5w57vyi4f5adzidqp5by2l2le4