A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2006; you can also visit <a rel="external noopener" href="http://research.ac.upc.edu/CAP/hpc/Papers/2001/aramirez2001aJ.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="Institute of Electrical and Electronics Engineers (IEEE)">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/yfvtieuumfamvmjlc255uckdlm" style="color: black;">Proceedings of the IEEE</a>
The design of higher performance processors has been following two major trends: increasing the pipeline depth to allow faster clock rates, and widening the pipeline to allow parallel execution of more instructions. Designing a higher performance processor implies balancing all the pipeline stages to ensure that overall performance is not dominated by any of them. This means that a faster execution engine also requires a faster fetch engine, to ensure that it is possible to read and decode<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/5.964440">doi:10.1109/5.964440</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/yp3a5e42wbfjtfkqsyfr5dkrcq">fatcat:yp3a5e42wbfjtfkqsyfr5dkrcq</a> </span>
more »... h instructions to keep the pipeline full and the functional units busy. This paper explores the challenges faced by the instruction fetch stage for a variety of processor designs, from early pipelined processors, to the more aggressive wide issue superscalars. We describe the different fetch engines proposed in the literature, the performance issues involved, and some of the proposed improvements. We also show how compiler techniques that optimize the layout of the code in memory can be used to improve the fetch performance of the different engines described. Overall, we show how instruction fetch has evolved from fetching one instruction every few cycles, to fetching one instruction per cycle, to fetching a full basic block per cycle, to several basic blocks per cycle: the evolution of the mechanism surrounding the instruction cache, and the different compiler optimizations used to better employ these mechanisms.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20060919031113/http://research.ac.upc.edu/CAP/hpc/Papers/2001/aramirez2001aJ.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/5.964440"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>