3,815 Hits in 8.4 sec

Efficient Hardware Optimization for CNN

Seda Güzel Aydın, Hasan Şakir Bilge
2022 International journal of multidisciplinary studies and innovative technologies  
In this study, an FPGA-based CNN architecture using high-level synthesis (HLS) is demonstrated, and a synthesis report is created for Xilinx Zynq-7000 xc7z020-clg484-1 target FPGAs.  ...  Therefore, hardware optimization techniques are compulsory.  ...  We thank the TUBITAK for their support of our research.  ... 
doi:10.36287/ijmsit.6.1.38 fatcat:kd22fxqx2ffk3cqzk2qkmswkey

Thread warping

Greg Stitt, Frank Vahid
2007 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis - CODES+ISSS '07  
We present a dynamic optimization technique, thread warping, that uses a single processor on a multiprocessor system to dynamically synthesize threads into custom accelerator circuits on FPGAs (field-programmable  ...  Building on dynamic synthesis for single-processor single-thread systems, known as warp processing, thread warping improves performances of multiprocessor systems by speeding up individual threads and  ...  Key details regarding the novel techniques for accelerator synthesis and accelerator instantiation are discussed in the following sections.  ... 
doi:10.1145/1289816.1289841 dblp:conf/codes/StittV07 fatcat:clqzdsg76fedxihf6phlsjcxqm

FPGA as a Hardware Accelerator for Computation Intensive Maximum Likelihood Expectation Maximization Medical Image Reconstruction Algorithm

Murali Ravi, Angu Sewa, Shashidhar T. G., Siva Sankara Sai Sanagapati
2019 IEEE Access  
Here, in this paper, for the first time, we present a parallel structure for hardware acceleration of the MLEM on the mammoth Virtex 7 VC709 FPGA.  ...  The FPGAs are becoming especially popular as hardware accelerators and are well known for their programmability, configurability, and massive parallelism through a large number of Configurable Logic Blocks  ...  ACKNOWLEDGMENT The authors would like to thank the Founder Chancellor Sri Sathya Sai Baba and the University for providing the infrastructure for this endeavour.  ... 
doi:10.1109/access.2019.2932647 fatcat:2mc7ormdevex5drtnjhtg5jcsa

FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs

Alexandros Papakonstantinou, Karthik Gururaj, John A. Stratton, Deming Chen, Jason Cong, Wen-Mei W. Hwu
2009 2009 IEEE 7th Symposium on Application Specific Processors  
GPUs can execute hundreds of concurrent threads, while FPGAs provide customized concurrency for highly parallel kernels.  ...  Our CUDA-to-FPGA flow employs AutoPilot, an advanced high-level synthesis tool which enables high-abstraction FPGA programming.  ...  Parallelism in C code for FPGA synthesis by AutoPilot is explicitly expressed through parallel function calls (Fig. 3) .  ... 
doi:10.1109/sasp.2009.5226333 dblp:conf/sasp/PapakonstantinouGSCCH09 fatcat:ejj3exxnyrfjbfcqwqoqqlf5hu

Automatic Generation of Efficient Accelerators for Reconfigurable Hardware

David Koeplinger, Raghu Prabhakar, Yaqi Zhang, Christina Delimitrou, Christos Kozyrakis, Kunle Olukotun
2016 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)  
Reconfigurable fabrics such as FPGAs are gaining popularity for use in implementing application-specific accelerators, thereby increasing the importance of having good high-level FPGA design tools.  ...  We show that estimates average 4.8% error for logic resources, 6.1% error for runtimes, and are 279 to 6533 times faster than a commercial high-level synthesis tool.  ...  ACKNOWLEDGMENTS The authors thank Maxeler Technologies for their assistance with this paper, and the reviewers for their suggestions.  ... 
doi:10.1109/isca.2016.20 dblp:conf/isca/KoeplingerPZDKO16 fatcat:wxo2ezckinb37lgkyek2lt4c2q

From software to accelerators with LegUp high-level synthesis

Andrew Canis, Jongsok Choi, Blair Fort, Ruolong Lian, Qijing Huang, Nazanin Calagar, Marcel Gort, Jia Jun Qin, Mark Aldham, Tomasz Czajkowski, Stephen Brown, Jason Anderson
2013 2013 International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES)  
However, implementing custom hardware accelerators for an application can be difficult and time intensive.  ...  LegUp is an open-source highlevel synthesis framework that simplifies the hardware accelerator design process [8] .  ...  A popular software pipelining technique is called iterative modulo scheduling [28] , which has been adapted for loop pipelining in high-level synthesis by C-to-Verilog [18] , PICO [29] , and also by  ... 
doi:10.1109/cases.2013.6662524 dblp:conf/cases/CanisCFLHCGQACBA13 fatcat:mkl646vbefa43irr2i725vmh6u

Automatic generation of efficient accelerators for reconfigurable hardware

David Koeplinger, Christina Delimitrou, Raghu Prabhakar, Christos Kozyrakis, Yaqi Zhang, Kunle Olukotun
2016 SIGARCH Computer Architecture News  
Reconfigurable fabrics such as FPGAs are gaining popularity for use in implementing application-specific accelerators, thereby increasing the importance of having good high-level FPGA design tools.  ...  We show that estimates average 4.8% error for logic resources, 6.1% error for runtimes, and are 279 to 6533 times faster than a commercial high-level synthesis tool.  ...  ACKNOWLEDGMENTS The authors thank Maxeler Technologies for their assistance with this paper, and the reviewers for their suggestions.  ... 
doi:10.1145/3007787.3001150 fatcat:e3tcrg2nr5bsbccayasppnkvzm

Highly Parallel Multi-FPGA System Compilation from Sequential C/C++ Code in the AWS Cloud

Kemal Ebcioglu, Ismail San
2022 ACM Transactions on Reconfigurable Technology and Systems  
Therefore, software development for using the multi-chip accelerator hardware is simplified, but the multi-chip accelerator can exhibit extremely high parallelism.  ...  New features of our compiler system include: an ability to parallelize outer loops with loop-carried control dependences, an ability to pipeline an outer loop without fully unrolling its inner loops, and  ...  CONCLUSION We presented an application-speciic, high-performance approach for multi-FPGA accelerator system design starting from sequential code.  ... 
doi:10.1145/3507698 fatcat:tizeillzrjhshngssrbtdunegm

FPGA HLS Today: Successes, Challenges, and Opportunities

Jason Cong, Jason Lau, Gai Liu, Stephen Neuendorffer, Peichen Pan, Kees Vissers, Zhiru Zhang
2022 ACM Transactions on Reconfigurable Technology and Systems  
In multiple ways, Year 2011 marked an important transition for FPGA high-level synthesis (HLS), as it went from prototyping to deployment.  ...  We hope that this paper can inspire more research on FPGA HLS and bring it to a new height.  ...  Yichi Zhang from Cornell University for their comments on Sections 2.3 and 2.1.2, respectively, and Marci Baun for editing the paper.  ... 
doi:10.1145/3530775 fatcat:hacv5vmlczbanpiurj73knmrzm

Loop coarsening in C-based High-Level Synthesis

Moritz Schmid, Oliver Reiche, Frank Hannig, Jurgen Teich
2015 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP)  
Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP), the support for Data-Level Parallelism (DLP), one of the key advantages of Field Programmable Gate  ...  Loop tiling corresponds to splitting an image into separate regions, which are then processed in parallel by replicated accelerators.  ...  The Tesla K20 used for this research was donated by the Nvidia Corporation.  ... 
doi:10.1109/asap.2015.7245730 dblp:conf/asap/SchmidRHT15 fatcat:3qglqzjkczh6phu76qonr2mbxm

Synthesis of reconfigurable high-performance multicore systems

Jason Cong, Karthik Gururaj, Guoling Han
2009 Proceeding of the ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '09  
Experiments show that our proposed techniques provide efficient solutions for real-life benchmarks and generate higher quality of results.  ...  In this paper we also demonstrate that designers can quickly explore a large number of accelerator design choices with the help of high-level synthesis tools.  ...  Loop pipelining: Loop pipelining [16] is an optimization technique to realize temporal parallelism by scheduling different iterations to be executed in an overlapped fashion.  ... 
doi:10.1145/1508128.1508159 dblp:conf/fpga/CongGH09 fatcat:ele2wun3wzaljcbrwarwhsfuem

Accelerating Statistical LOR Estimation for a High-Resolution PET Scanner Using FPGA Devices and a High Level Synthesis Tool

Zhong-Ho Chen, Alvin W.Y. Su, Ming-Ting Sun, Scott Hauck
2011 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines  
In this paper, we use an FPGA platform and a high level synthesis tool, called Impulse C, to speedup a statistical Line Of Reaction (LOR) estimation for a high-resolution Positron Emission Tomography (  ...  We describe some optimization methods for the algorithm using Impulse C. These methods could also be applied to other applications or used to improve the high level synthesis tools.  ...  We also thank Impulse Accelerated Technologies for the access to the Impulse C compiler and Altera for the FPGA compilers.  ... 
doi:10.1109/fccm.2011.15 dblp:conf/fccm/ChenSSH11 fatcat:2utde7earnbyncg3vx43d47bia

Optimised OpenCL workgroup synthesis for hybrid ARM-FPGA devices

Mohammad Hosseinabady, Jose Luis Nunez-Yanez
2015 2015 25th International Conference on Field Programmable Logic and Applications (FPL)  
This paper presents a workgroup synthesis mechanism to compile an OpenCL kernel to FPGA-based accelerators embedded in a multi-core CPU system-on-a-chip (SoC).  ...  Coping with the limited amount of internal memory in embedded FPGAs, the workgroup synthesis utilises a novel data access pattern formulation to describe the parallelism already provided by the OpenCL  ...  ACKNOWLEDGEMENT The authors would like to thank the reviewers for their valuable comments. This research is a part of the ENPOWER project sponsored by EPSRC.  ... 
doi:10.1109/fpl.2015.7294016 dblp:conf/fpl/HosseinabadyN15a fatcat:mrl6eey55rdmbiwepgi7xscgfi

Automating the Design of Processor/Accelerator Embedded Systems with LegUp High-Level Synthesis

Blair Fort, Andrew Canis, Jongsok Choi, Nazanin Calagar, Ruolong Lian, Stefan Hadjis, Yu Ting Chen, Mathew Hall, Bain Syrowik, Tomasz Czajkowski, Stephen Brown, Jason Anderson
2014 2014 12th IEEE International Conference on Embedded and Ubiquitous Computing  
In this paper, we overview the LegUp framework and describe several recent developments: 1) support for an embedded ARM processor, as is available on Altera's recently released SoC FPGA; 2) HLS support  ...  for software parallelization schemes -pthreads and OpenMP; 3) enhancements to LegUp's core HLS algorithms that raise the quality of the auto-generated hardware; and, 4) a preliminary debugging and verification  ...  Note that while the CHStone benchmarks are commonly used in HLS research, they generally do not have opportunities for loop pipelining, and as such, the loop optimizations described in previous sections  ... 
doi:10.1109/euc.2014.26 dblp:conf/euc/FortCCCLHCHSCBA14 fatcat:gm6y5hryjjghjdfws74y7jzody

Towards Automatic High-Level Code Deployment on Reconfigurable Platforms: A Survey of High-Level Synthesis Tools and Toolchains

Mostafa W. Numan, Braden J. Phillips, Gavin S. Puddy, Katrina Falkner
2020 IEEE Access  
This paper is motivated by the idea of a software tool that can automatically accomplish the task of deploying code, originally written for a conventional computer, to the processors and reconfigurable  ...  computing systems with tightly coupled processors and reconfigurable logic blocks provide great scope to improve software performance by executing each section of code on the processor or custom hardware accelerator  ...  DATAFLOW APPROACH FOR FPGA SYNTHESIS The semantic gap between sequential HLL code and its parallel dataflow representation as FSMs often leads developers to manually optimise hardware accelerators, or  ... 
doi:10.1109/access.2020.3024098 fatcat:hk7s2deq6zgp5fnuwvm5k6jodu
« Previous Showing results 1 — 15 out of 3,815 results