Efficient complex operators for irregular codes

Jack Sampson, Ganesh Venkatesh, Nathan Goulding-Hotta, Saturnino Garcia, Steven Swanson, Michael Bedford Taylor
2011 2011 IEEE 17th International Symposium on High Performance Computer Architecture  
Complex "fat operators" are important contributors to the efficiency of specialized hardware. This paper introduces two new techniques for constructing efficient fat operators featuring up to dozens of operations with arbitrary and irregular data and memory dependencies. These techniques focus on minimizing critical path length and loaduse delay, which are key concerns for irregular computations. Selective Depipelining(SDP) is a pipelining technique that allows fat operators containing several,
more » ... containing several, possibly dependent, memory operations. SDP allows memory requests to operate at a faster clock rate than the datapath, saving power in the datapath and improving memory performance. Cachelets are small, customized, distributed L0 caches embedded in the datapath to reduce load-use latency. We apply these techniques to Conservation Cores(ccores) to produce coprocessors that accelerate irregular code regions while still providing superior energy efficiency. On average, these enhanced c-cores reduce EDP by 2× and area by 35% relative to c-cores. They are up to 2.5× faster than a general-purpose processor and reduce energy consumption by up to 8× for a variety of irregular applications including several SPECINT benchmarks. Recent work [27] examined one approach to improving the energy efficiency of these codes by converting dark silicon into a collection of energy-saving applicationspecialized cores called Conservation Cores, or c-cores. That approach emphasized energy savings while matching the performance of a conventional processor, but three factors limited its performance and energy efficiency gains. First, synchronization with the memory system restricted
doi:10.1109/hpca.2011.5749754 dblp:conf/hpca/SampsonVGGST11 fatcat:yqjxqk44jba4tjjtwweqcwpypi