Interconnect synthesis of heterogeneous accelerators in a shared memory architecture
2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED)
An accelerator-rich architecture (ARA) is composed of heterogeneous accelerators with an on-chip memory system. Compared to the general-purpose processors, an accelerator demands short and predictable latency to its local on-chip memory to satisfy its performance target. Moreover, an accelerator requires a much higher off-chip memory bandwidth than a CPU since it consumes much more data in a given time period. Therefore, a customized on-chip memory system design is one of the keys to an
... t ARA. In this work we provide a twolayer interconnect synthesis method. We first provide an optimal layer of partial crossbar that connects the heterogeneous accelerators and shared memory banks with a minimum number of switches. The second layer of interconnect tries to interleave possible conflicting long-burst memory requests for prefetching data from off-chip memory. The experimental results show that we can reduce more than 45% of the switches of the partial crossbar compared to the best known method. This further leads to 53% reduction of LUTs and 34% reduction of slice utilization on a 30accelerator FPGA prototype. Furthermore, the performance of an ARA can be improved by 36% -52% with a well-designed interleaved network in a real ARA prototype for medical imaging applications. This prototype also shows a 7.44x energy efficiency gain over the state-of-the-art Xeon processors.