Multi-granular Arithmetic in a Coarse-Grain Reconfigurable Architecture

Stef Louwers, Luc Waeijen, Mark Wijtvliet, Ruud Koolen, Henk Corporaal
2016 2016 Euromicro Conference on Digital System Design (DSD)  
MASTER Energy efficient multi-granular arithmetic in a coarse-grain reconfigurable architecture Louwers, S.T. Award date: 2016 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality
more » ... of research of student theses may vary by program, and the required minimum study period may vary in duration. Abstract Coarse-Grain Reconfigurable Architectures (CGRAs) are a class of architectures that can be dynamically adapted to match an application, similar to Field Programmable Gate Arrays (FPGAs). Unlike FPGA systems, which can be programmed at the gate level, CGRA systems can be programmed as a network of higher level operations such as addition and multiplication. By being configurable at a coarser granularity, these systems are much more energy efficient than an FPGA, but this comes at a loss of adaptability. In a CGRA system, the width of the configurable operation units is traditionally a difficult design decision. If this width is too narrow, not all operations are natively possible, and software support is required to calculate larger operations. On the other hand, if the width is too wide, energy is wasted on the computation of unnecessary operand bits. One way of solving this issue is by designing the operation circuits such that several such units can be combined efficiently to form a single, bigger arithmetic unit. Each operation performed by the application can then be computed by a combined arithmetic unit of the exact width required by the application. Computing wide operations this way is not as efficient as a native wide circuit, but the upside of this approach is that narrower operations can be performed much more efficiently than in the alternative design. We call this concept Multi-Granular Arithmetic. In this report, we investigate the details of performing common arithmetic operations in a multi-granular setting in the context of the BLOCKS CGRA architecture. For the operations of addition, accumulation, multiplication, and multiply-accumulation, we show that the multi-granular design is feasible, with a very modest efficiency cost for wide operations, and substantial efficiency gains for narrow operations. Using a silicon synthesis-toolflow analysis, we demonstrate the ability to perform a narrow multiplication at an energy cost 15 times lower than the native alternative under realistic conditions, with an energy cost of a factor 1.5 for performing the matching wide multiplication.
doi:10.1109/dsd.2016.98 dblp:conf/dsd/LouwersWWKC16 fatcat:kp4i52q62rclpjbk2rsr3jkf2a