High-throughput, energy-efficient network-on-chip-based hardware accelerators

Turbo Majumder, Partha Pratim Pande, Ananth Kalyanaraman
2013 Sustainable Computing: Informatics and Systems  
Hardware accelerator Energy efficient Long-range links Wireless a b s t r a c t Several emerging application domains in scientific computing demand high computation throughputs to achieve terascale or higher performance. Dedicated centers hosting scientific computing tools on a few high-end servers could rely on hardware accelerator co-processors that contain multiple lightweight custom cores interconnected through an on-chip network. With increasing workloads, these many-core platforms need to
more » ... deliver high overall computation throughput while also being energy-efficient. Conventional multicore architectures can achieve a limited computational throughput due to the inherent multi-hop nature of the on-chip network infrastructure. By inserting long-range links that act as shortcuts in a regular network-on-chip (NoC) architecture, both the achievable bandwidth and energy efficiency of a multicore platform can be significantly enhanced. In this paper, we first propose a NoC-driven usecase model for throughput-oriented scientific applications, and subsequently use the model to study the effect of using long-range links in conjunction with different resource allocation strategies on reducing the overall on-chip communication and enhancing computational throughput. NoCs with both wired and on-chip wireless links are explored in the study. We also evaluate our NoC-based platforms with respect to energy-efficiency and power consumption. We analyze how throughput and power consumption are correlated with the statistical properties of the application traffic. In addition, we compare and analyze chip-level thermal profiles for these alternatives. Our experiments using kernels from a popular phylogenetic inference application suite show that we can deliver computation throughput over 10 11 operations per second, consuming ∼0.5 nJ per operation, while ensuring that on-chip temperature variation is within 26 • C.
doi:10.1016/j.suscom.2013.01.001 fatcat:3lilgssw75apbbyda7j7iqd72e