A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit <a rel="external noopener" href="http://cseweb.ucsd.edu/~swanson/papers/TECS-CCores.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="Association for Computing Machinery (ACM)">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/3yu4mbaainhw5gol4yayc35pfe" style="color: black;">ACM Transactions on Embedded Computing Systems</a>
As chip designers face the prospect of increasingly dark silicon, there is increased interest in incorporating energy-efficient specialized coprocessors into general-purpose designs. For specialization to be a viable means of leveraging dark silicon, it must provide energy savings over the majority of execution for large, diverse workloads, and this will require deploying coprocessors in large numbers. Recent work has shown that automatically generated application-specific coprocessors can<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2584657">doi:10.1145/2584657</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/pvkgf2jbynf7zkdw2257y6zq2y">fatcat:pvkgf2jbynf7zkdw2257y6zq2y</a> </span>
more »... ly improve energy efficiency, but it is not clear that current techniques will scale to Coprocessor-Dominated Architectures (CoDAs) with hundreds or thousands of coprocessors. We show that scaling CoDAs to include very large numbers of coprocessors is challenging because of the energy cost of interconnects, the memory system, and leakage. These overheads grow with the number of coprocessors and, left unchecked, will squander the energy gains that coprocessors can provide. The article presents a detailed study of energy costs across a wide range of tiled CoDA designs and shows that careful choice of cache configuration, tile size, coarse-grain power management and transistor implementation can limit the growth of these overheads. For multithreaded workloads, designer must also take care to avoid excessive contention for coprocessors, which can significantly increase energy consumption. The results suggest that, for CoDAs that target larger workloads, amortizing shared overheads via multithreading can provide up to 3.8× reductions in energy per instruction, retaining much of the 5.3× potential of smaller designs.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170829012422/http://cseweb.ucsd.edu/~swanson/papers/TECS-CCores.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ec/9e/ec9edc0368639014ece66f074109bee49a347da6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/2584657"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>