Combining Rewriting-Logic, Architecture Generation, and Simulation to Exploit Coarse-Grained Reconfigurable Architectures

Carlos Morra, João Bispo, João M.P. Cardoso, Jürgen Becker
2008 2008 16th International Symposium on Field-Programmable Custom Computing Machines  
Introduction Coarse-grained reconfigurable arrays (CGRAs) based on the VLIW concept can be interesting solutions to speed-up hotspots of certain applications. They rely on a 1D array of Processing Elements (PEs), as illustrated in the example in Fig. 1 . The PEs can typically be programmed to execute ALU operations, multiplications, load/store operations, etc. It is common to use a global register file for storing scalar variables and intermediate results. Although the VLIW concept has limited
more » ... oncept has limited forms of spatial computing, using register files to communicate globally data eases the mapping process, since the interconnects between computing structures are simplified to register assignments and subsequent moves. However, compared to architectures using interconnects between PEs, in VLIW-based CGRAs data communication between PEs is slower and the cost increases with the number of ports for the register files. CGRAs may differ by the number and complexity of their PEs. For instance, PEs may support directly 2-, 3-and 4-operand instructions. To better customizing a CGRA template, suitable for one or more kernels, we need approaches to evaluate those main differences. Our work addresses a global and unique methodology to exploit CGRAs-1D, such as the one in Fig. 1 . Multi-Port Register File PE 1 . . . Configuration Controller Configuration Memory flag Mask ORed Processor Memory PE 2 PE N Arbitrer ALU ALU ALU Fig. 1. Typical 1-D array currently exploited. Our methodology uses term-rewriting logic [2] for the compiler to be aware of the templates of instructions directly supported by each PE, when mapping an imperative programming model to the target architecture. For early evaluation, the methodology is now enriched by a high-level clock cycle simulator and an architecture generator. The architec-ture generator outputs, from a textual structural description, VHDL code ready for logic synthesis. Recent advances to our approach, previously presented in [1], allow us to analyze the effect of the number and different types of PEs on performance, FPGA resource utilization, and maximum clock speed. Architecture Exploration using Rewriting-Logic Our proposed environment is presented in Fig. 2 . The input is an extended, three-address, Static Single Assignment (SSA), intermediate form, generated by the Nau compiler [3] from the Java Bytecodes of a given class method. Fig. 2. Proposed environment. The different types of PEs are described using term rewriting rules which are functional templates used by the Rewriting-Logic environment [4] to identify sets of SSA instructions that can be executed in each PE. The mapping process is managed using logic strategies. Different strategies can be defined and selected in order to find the effect of the number and PE types on the performance of the application being mapped. An example of a PE definition is given in Fig. 2, path (1) . In this example, a two level PE with an * This work was partially supported by a DAAD/CRUP cooperation project entitled ACER. Cardoso and Bispo were also partially supported by project COBAYA, funded by the Portuguese Foundation for Science and Technology (FCT). 16th International Symposium on Field-Programmable Custom Computing Machines 978-0-7695-3307-0/08 $25.00
doi:10.1109/fccm.2008.37 dblp:conf/fccm/MorraBCB08 fatcat:dkvumlsavrbedftfcsljb6qg64