High performance reconfigurable computing for numerical simulation and deep learning

Lin Gan, Ming Yuan, Jinzhe Yang, Wenlai Zhao, Wayne Luk, Guangwen Yang
2020 CCF Transactions on High Performance Computing  
Due to their customizable on-chip resources, reconfigurable computing platforms such as FPGAs are able to achieve better time-to-solution and energy-to-solution than general-purpose processors. They have been widely adopted in many important applications, from traditional numerical processing to emerging deep learning systems. Since FPGAs have become promising options for current and future high performance computing, this report summarises and analyses recent FPGA-related efforts, including
more » ... latest industrial approaches, the state-of-the-art reconfigurable solutions, and various issues such as on-chip resources and development productivity. High performance reconfigurable computing for numerical simulation and deep learning 1 3 deploy more computing units into a single chip, and accommodate more computing chips into a single system, the expenses, the scaling efficiency and the power consumption are becoming major bottlenecks. While we have plenty of computing resources available to scale up, problems such as the computing efficiency, bandwidth, load balance, become urgent issues in front of us. At the same time, with real-world application getting more and more complicated, traditional general-purpose chips are no longer able to well meet the complex demand for some important applications in memory, bandwidth, computing efficiency, etc. For example, the computing efficiencies, the memory wall, as well as the power issues, are becoming more and more serious when mapping traditional numerical applications such as geoscience applications onto the leading-edge supercomputing systems. We have to think of some new solutions in order to solve all these issues. Compared with traditional multi-core or many-core chips that depend on deploying more computing units for parallel computing, reconfigurable computing systems, such as those based on field programmable gate array (FPGA) technology, bring a brand-new and completely different computing pattern that mostly relies on a data flow computing model for achieving better performance. Based on the special reconfigurable features of FPGA, the on-chip resources can be configured into several long pipelines of different concurrent units that fit well with the selected algorithm. Data from the memory will be streamed through different pipelines for higher computing efficiencies. Furthermore, the low clock frequency of reconfigurable systems is able to reduce their power consumption, resulting in significant potential for better energy efficiency. The above advantages, together with the flexibility of their on-chip resources supporting, for example, multiple numerical precisions, have provided a promising candidate for both current-and next-generation supercomputing architectures. In the mean time, reconfigurable computing systems such as FPGAs begin to become more and more popular in lots of important and essential areas, such as numerical scientific application and machine learning, etc. Inspiring results have been achieved in providing solutions with better time-to-solution and energy-to-solution, and will be discussed as a major part of this report. In all, this report summarises and analyses recent FPGA-related efforts in HPC. To that end, this report introduces in detail about the working mechanisms and state-of-the-art solutions of the FPGA-based reconfigurable computing pattern (detailed in Sect. 2), which is completely different from traditional multi-core or many-core HPC chips. It then looks at some of the inspiring work that ranges from traditional numerical applications (detailed in Sect. 4), to the emerging deep learning technologies (detailed in Sect. 5), and discusses additional major issues (detained in Sect. 6).
doi:10.1007/s42514-020-00032-x fatcat:mbnb73zazzgohhe4quhuqlryky