Filters








9 Hits in 4.8 sec

Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay [article]

Cheng Liu, Ho-Cheung Ng, Hayden Kwok-Hay So
2015 arXiv   pre-print
In this work, an automatic nested loop acceleration framework utilizing a regular soft coarse-grained reconfigurable array (SCGRA) overlay is presented.  ...  Offloading compute intensive nested loops to execute on FPGA accelerators have been demonstrated by numerous researchers as an effective performance enhancement technique across numerous application domains  ...  NESTED LOOP ACCELERATOR DESIGN FRAMEWORK By using a regular SCGRA overlay built on top of the physical FPGA devices, we have developed an automatic nested loop acceleration framework called QuickDough.  ... 
arXiv:1509.00042v1 fatcat:vmqhs5uas5gornb25pkkl6digi

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective [article]

Artur Podobas, Kentaro Sano, Satoshi Matsuoka
2020 arXiv   pre-print
We summarize nearly three decades of literature on the subject, with particular focus on premises behind the different CGRA architectures and how they have evolved.  ...  We find that there are ample opportunities for future research on CGRAs, in particular with respect to size, functionality, support for parallel programming models, and to evaluate more complex applications  ...  ACKNOWLEDGEMENTS This article is based on results obtained from a project commissioned by the New energy and Industrial Technology Development Organization (NEDO).  ... 
arXiv:2004.04509v1 fatcat:sxnq32chxjf6hfc5ygjsxqjwl4

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective

Artur Podobas, Kentaro Sano, Satoshi Matsuoka
2020 IEEE Access  
We summarize nearly three decades of literature on the subject, with a particular focus on the premise behind the different CGRAs and how they have evolved.  ...  Recently, a particular branch of reconfigurable architecture -the Field-Programmable Gate Arrays (FPGAs) [9] -has experienced a surge of renewed interest for use in High-Performance Computing (HPC), and  ...  This article is based on results obtained from a project commissioned by New Energy and Industrial Technology Development Organization (NEDO).  ... 
doi:10.1109/access.2020.3012084 fatcat:xx6k4lxbjbc4tjebbymp42w634

Microarchitectural Comparison of the MXP and Octavo Soft-Processor FPGA Overlays

Charles Eric Laforest, Jason H. Anderson
2017 ACM Transactions on Reconfigurable Technology and Systems  
To reduce and accelerate the design effort, we can implement an overlay architecture on the FPGA, on which we then more easily construct the desired system but at a large cost in performance and area relative  ...  In this work, we compare the micro-architecture, performance, and area of two soft-processor overlays: the Octavo multi-threaded soft-processor and the MXP soft vector processor.  ...  The E/F Select block simply selects one of the E/F bits based on the address, while the I/O Detect block signals if the address refers to an I/O port.  ... 
doi:10.1145/3053679 fatcat:b6mdbia2zvbp5nro2q7qs46zhy

Exploring Trade-Offs between Specialized Dataflow Kernels and a Reusable Overlay in a Stereo Matching Case Study

Tobias Kenter, Henning Schmitz, Christian Plessl
2015 International Journal of Reconfigurable Computing  
a vector coprocessor with large vector lengths, which is implemented as a form of programmable overlay on the application FPGAs of a Convey HC-1.  ...  As common starting point, we employ a kernel-centric design approach, where computational hotspots in an application are identified and individually accelerated on FPGA.  ...  overlay architectures on FPGAs in a nontrivial and practically relevant use case.  ... 
doi:10.1155/2015/859425 fatcat:arxkxznrofh23jscetqdztl4cy

2018 IndexIEEE Transactions on Very Large Scale Integration (VLSI) SystemsVol. 26

2018 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
., +, TVLSI June 2018 1183-1191 Fast Neural Network Training on FPGA Using Quasi-Newton Optimization Method.  ...  Che, W., +, TVLSI April 2018 733-743 Optimizing the Convolution Operation to Accelerate Deep Neural Net- works on FPGA.  ... 
doi:10.1109/tvlsi.2019.2892312 fatcat:rxiz5duc6jhdzjo4ybcxdajtbq

Hardware-Accelerated Platforms and Infrastructures for Network Functions: A Survey of Enabling Technologies and Research Studies

Prateek Shantharama, Akhilesh S. Thyagaturu, Martin Reisslein
2020 IEEE Access  
However, the hard NoC system outperforms both the soft NoC and the bus-based FPGA implementation.  ...  × 4 CGRA interconnected by 2D mesh, (b) FU placement on routing fabric with bidirectional link support.  ... 
doi:10.1109/access.2020.3008250 fatcat:kv4znpypqbatfk2m3lpzvzb2nu

0 Instruction Set Architecture [chapter]

2003 Digital Design and Computer Organization  
For the purpose of loop transformations, formal mathematical models of loop nests, such as the Polyhedral model [6] , are used.  ...  FPGA 15 .  ...  3 Found initial prototype mapped on : core_3b 4 Loading information from APEX file ... 5 Initial prototype has 3 issue -slots 6 7 Searching for best 'ed ' fitness solution ... 8 Using ' issue -  ... 
doi:10.1201/b12403-15 fatcat:mygaz2meibgljew5tzvmuw6x5i

Merging Datapaths using Data Processing Graphs

Philip Rohde
2021
This was evaluated on the example of hardware accelerators that are generated from C-code using PIRANHA, a plugin for the GCC compiler.  ...  As the executed software never starts two accelerators in parallel, the resource utilization on the FPGA can be reduced by sharing common resources. It turned out that this problem [...]  ...  Like other HLS tools PIRANHA analyzes and identifies loops, which may be nested, that are worth accelerating.  ... 
doi:10.26083/tuprints-00011314 fatcat:a5fwatove5cbnnvdpnxnicc7tq