A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2010; you can also visit <a rel="external noopener" href="http://people.engr.ncsu.edu:80/hzhou/dce_pact_2005_ieee.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/i2tihayjsjhmhmrqaqe2apcipy" style="color: black;">14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)</a>
Current integration trends embrace the prosperity of single-chip multi-core processors. Although multi-core processors deliver significantly improved system throughput, single-thread performance is not addressed. In this paper, we propose a new execution paradigm that utilizes multi-cores on a single chip collaboratively to achieve high performance for single-thread memoryintensive workloads while maintaining the flexibility to support multithreaded applications. The proposed execution<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/pact.2005.18">doi:10.1109/pact.2005.18</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/IEEEpact/Zhou05.html">dblp:conf/IEEEpact/Zhou05</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/trbkqaflhndydpqebxkimzj3tm">fatcat:trbkqaflhndydpqebxkimzj3tm</a> </span>
more »... dual-core execution, consists of two superscalar cores (a front and back processor) coupled with a queue. The front processor fetches and preprocesses instruction streams and retires processed instructions into the queue for the back processor to consume. The front processor executes instructions as usual except for cache-missing loads, which produce an invalid value instead of blocking the pipeline. As a result, the front processor runs far ahead to warm up the data caches and fix branch mispredictions for the back processor. In-flight instructions are distributed in the front processor, the queue, and the back processor, forming a very large instruction window for single-thread out-oforder execution. The proposed architecture incurs only minor hardware changes and does not require any large centralized structures such as large register files, issue queues, load/store queues, or reorder buffers. Experimental results show remarkable latency hiding capabilities of the proposed architecture, even outperforming more complex single-thread processors with much larger instruction windows than the front or back processor.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20100613165334/http://people.engr.ncsu.edu:80/hzhou/dce_pact_2005_ieee.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ca/bd/cabd62a8d96edab241927f0c8056d49e824e0a6b.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/pact.2005.18"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>