A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2007; you can also visit <a rel="external noopener" href="http://www.cs.ucr.edu/~vahid/pubs/date04_clf.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="IEEE Comput. Soc">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/qjrrvry5ubgdlarkymvlxuip6m" style="color: black;">Proceedings Design, Automation and Test in Europe Conference and Exhibition</a>
In previous work, we showed the benefits and feasibility of having a processor dynamically partition its executing software such that critical software kernels are transparently partitioned to execute as a hardware coprocessor on configurable logic -an approach we call warp processing. The configurable logic place and route step is the most computationally intensive part of such hardware/software partitioning, normally running for many minutes or hours on powerful desktop processors. In<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/date.2004.1268892">doi:10.1109/date.2004.1268892</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/date/LyseckyV04.html">dblp:conf/date/LyseckyV04</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/n3b6srdjmnb37gss7yfwpnraga">fatcat:n3b6srdjmnb37gss7yfwpnraga</a> </span>
more »... , dynamic partitioning requires place and route to execute in just seconds and on a lean embedded processor. We have therefore designed a configurable logic architecture specifically for dynamic hardware/software partitioning. Through experiments with popular benchmarks, we show that by specifically focusing on the goal of software kernel speedup when designing the FPGA architecture, rather than on the more general goal of ASIC prototyping, we can perform place and route for our architecture 50 times faster, using 10,000 times less data memory, and 1,000 times less code memory, than popular commercial tools mapping to commercial configurable logic. Yet, we show that we obtain speedups (2x on average, and as much as 4x) and energy savings (33% on average, and up to 74%) when partitioning even just one loop, which are comparable to commercial tools and fabrics. Thus, our configurable logic architecture represents a good candidate for platforms that will support dynamic hardware/software partitioning, and enables ultra-fast desktop tools for hardware/software partitioning, and even for fast configurable logic design in general.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20070416213210/http://www.cs.ucr.edu/~vahid/pubs/date04_clf.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/93/9d/939ddf5fdcead4616ead20360db8a6f8699ff0a0.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/date.2004.1268892"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>