167 Hits in 6.6 sec

Hardware design space exploration using HercuLeS HLS

Nikolaos Kavvadias, Kostas Masselos
2013 Proceedings of the 17th Panhellenic Conference on Informatics - PCI '13  
IP integration, d) backend C code generation for compiled simulation, and e) an exemplary case of DSE.  ...  HercuLeS is an extensible high-level synthesis (HLS) environment.  ...  since it uses a graph-based back-end; d) open specifications such as Graphviz [9] and NAC are used throughout the HLS process, and e) the generated HDL code is completely vendor-and technology-independent  ... 
doi:10.1145/2491845.2491865 dblp:conf/pci/KavvadiasM13 fatcat:4o6s4hyztff3hmbc4p72b63rvy

A Toolchain for Dynamic Function Off-load on CPU-FPGA Platforms

Takaaki Miyajima, David Thomas, Hideharu Amano
2015 Journal of Information Processing  
This new toolchain for accelerating application on CPU-FPGA platforms, called Courier-FPGA, extracts runtime information from a running target binary, and re-constructs the function call graph including  ...  Then, it synthesizes hardware modules on the FPGA and makes software functions on CPU by using Pipeline Generator.  ...  Acknowledgments The present study is supported in part by the JST/CREST program entitled "Research and Development on Unified Environment of Accelerated Computing and Interconnection for Post-Petascale  ... 
doi:10.2197/ipsjjip.23.153 fatcat:2pb5qh2ocjaiver3f2bpiykyba

Pushing the Level of Abstraction of Digital System Design: a Survey on How to Program FPGAs

Emanuele Del Sozzo, Davide Conficconi, Alberto Zeni, Mirko Salaris, Donatella Sciuto, Marco D. Santambrogio
2022 ACM Computing Surveys  
They are state-of-the-art for prototyping, telecommunications, embedded, and an emerging alternative for cloud-scale acceleration.  ...  We review these abstraction solutions, provide a timeline, and propose a taxonomy for each abstraction trend: programming models for HDLs; IP-based or System-based toolchains for HLS; application, architecture  ...  ACKNOWLEDGEMENTS The authors are grateful for feedbacks from Reviewers and NECSTLab members, with a particular mention to A. Damiani, A. Parravicini, E. D'Arnese, F. Carloni, F. Peverelli, and R.  ... 
doi:10.1145/3532989 fatcat:nsk5lwvt3vba5fbxmaj7sgpwru

HipaccVX: Wedding of OpenVX and DSL-based Code Generation [article]

M. Akif Özkan, Burak Ok, Bo Qiao, Jürgen Teich, Frank Hannig
2020 arXiv   pre-print
These optimizations can double the throughput on an Nvidia GTX GPU and decrease the resource usage of a Xilinx Zynq FPGA by 50% for our benchmarks.  ...  Yet, the OpenVX' algorithm space is constrained to a small set of vision functions. This hinders accelerating computations that are not included in the standard.  ...  The filter graph function (Line 21) Algorithm 1 : 1 Graph Analysis for Dead Computation Elimination input : Gapp -application graphDnv -set of are non-virtual data nodes output : G f ilt -optimized application  ... 
arXiv:2008.11476v1 fatcat:e4yyu4ei7nayjma5rpt6p3nmei

HipaccVX: wedding of OpenVX and DSL-based code generation

M. Akif Özkan, Burak Ok, Bo Qiao, Jürgen Teich, Frank Hannig
2020 Journal of Real-Time Image Processing  
These optimizations can double the throughput on an Nvidia GTX GPU and decrease the resource usage of a Xilinx Zynq FPGA by 50% for our benchmarks.  ...  Yet, the OpenVX ' algorithm space is constrained to a small set of vision functions. This hinders accelerating computations that are not included in the standard.  ...  Listing 1 shows an example OpenVX code for a simple edge detection algorithm, for which the application graph is shown in Fig. 1 .  ... 
doi:10.1007/s11554-020-01015-5 fatcat:iowzgiohnvc3beo4at6aamcb5y

SNNAP: Approximate computing on programmable SoCs via neural acceleration

Thierry Moreau, Mark Wyse, Jacob Nelson, Adrian Sampson, Hadi Esmaeilzadeh, Luis Ceze, Mark Oskin
2015 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)  
We describe the design and implementation of SNNAP, a flexible FPGA-based neural accelerator for approximate programs.  ...  No hardware expertise is required to accelerate software with SNNAP, so the effort required can be substantially lower than custom hardware design for an FPGA fabric and possibly even lower than current  ...  The authors thank Eric Chung for his help on prototyping accelerators on the Zynq.  ... 
doi:10.1109/hpca.2015.7056066 dblp:conf/hpca/MoreauWNSECO15 fatcat:kcr5mngnrncbfmeanik46g3t4i

Hardware Compilation of Deep Neural Networks: An Overview

Ruizhe Zhao, Shuanglong Liu, Ho-Cheung Ng, Erwei Wang, James J. Davis, Xinyu Niu, Xiwei Wang, Huifeng Shi, George A. Constantinides, Peter Y. K. Cheung, Wayne Luk
2018 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP)  
Deploying a deep neural network model on a reconfigurable platform, such as an FPGA, is challenging due to the enormous design spaces of both network models and hardware design.  ...  Design templates for neural network accelerators are studied with a specific focus on their derivation methodologies.  ...  In other implementations, the authors applied polyhedral analysis for DNN acceleration on non-FPGA platforms.  ... 
doi:10.1109/asap.2018.8445088 dblp:conf/asap/ZhaoLNWDNWSCCL18 fatcat:v5txrrsfifa6bah2oksjdlrsgi

Virtualized Execution Runtime for FPGA Accelerators in the Cloud

Mikhail Asiatici, Nithin George, Kizheppatt Vipin, Suhaib A. Fahmy, Paolo Ienne
2017 IEEE Access  
Parallel kernels are used to generate an HLS description, which is processed by the HLS tool (Vivado HLS 2015.4.2 in our case) to generate an RTL description of the hardware accelerators.  ...  Lastly, a third approach supports direct RTL description of accelerators, bypassing the HLS generation stage.  ... 
doi:10.1109/access.2017.2661582 fatcat:donw4yrggjdftkgob3x5l5r664

Applications and Techniques for Fast Machine Learning in Science [article]

Allison McCarn Deiana, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini (+74 others)
2021 arXiv   pre-print
This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions.  ...  The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for  ...  Most of the efforts are focused on digital CMOS technology, such as implementations based on general-purpose TPUs/GPUs, FPGAs, and more specialized ML hardware accelerators.  ... 
arXiv:2110.13041v1 fatcat:cvbo2hmfgfcuxi7abezypw2qrm

HIR: An MLIR-based Intermediate Representation for Hardware Accelerator Description [article]

Kingshuk Majumder, Uday Bondhugula
2021 arXiv   pre-print
Though FPGAs are an ideal target for energy efficient custom accelerators, the difficulty of hardware design and the lack of vendor agnostic, standardized hardware compilation infrastructure has hindered  ...  Our implementation shows that the code generation time of the HIR code generator is on average 1112x lower than that of Xilinx Vivado HLS on a range of kernels without a compromise on the quality of the  ...  For example, the generated accelerator may be deployed as a part of a larger design containing PCIe controllers, DRAM controllers and soft CPU cores (CPU implemented on an FPGA).  ... 
arXiv:2103.00194v1 fatcat:vwv7jfr2ofgxjih7uamqxvv4xe

FPGA-based accelerator development for non-engineers

David Uliana, Peter Athanas, Krzysztof Kepa
2014 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)  
However, domain experts, who are the brains behind this processing, typically lack the skills required to build FPGA-based hardware accelerators ideal for their applications, as traditional development  ...  The efficacy of these flows in extending FPGA-based acceleration to non-engineers in the life sciences was informally tested at two separate instances of an NSF-funded summer workshop, organized and hosted  ...  In general, however, the use of a non-standard language as the entry format of an HLS tool creates a barrier to widespread adoption [25] .  ... 
doi:10.1109/reconfig.2014.7032522 dblp:conf/reconfig/UlianaAK14 fatcat:s6krmx2zerbffnpagsanagvjwy

High-Level Synthesis in the Delft Workbench Hardware/Software Co-design Tool-Chain

Razvan Nane, Vlad Mihai Sima, Cuong Pham Quoc, Fernando Goncalves, Koen Bertels
2014 2014 12th IEEE International Conference on Embedded and Ubiquitous Computing  
to an increasing attention for HLS tool development and optimization from both the academia as well as the industry.  ...  This advantage, coupled with the increasing number of available heterogeneous platforms that loosely couple general-purpose processors with Field-Programmable Gate Array (FPGA)-based co-processors, led  ...  Reconfigurable fabrics such as FPGAs can be used as stand-alone processing units or in combination with a General-Purpose Processor (GPP).  ... 
doi:10.1109/euc.2014.28 dblp:conf/euc/NaneSPGB14 fatcat:elr5fcxj6bec7btbpuywsvnfnq

Are Coarse-Grained Overlays Ready for General Purpose Application Acceleration on FPGAs?

Abhishek Kumar Jain, Douglas L. Maskell, Suhaib A. Fahmy
2016 2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)  
Section V examines the use of an FPGA overlay for general purpose application acceleration within a hybrid FPGA. Finally, we conclude in Section VI. II.  ...  Are Coarse-Grained Overlays Ready for General Purpose Application Acceleration on FPGAs? Abhishek Kumar Jain * , Douglas L. Maskell * and Suhaib A.  ... 
doi:10.1109/dasc-picom-datacom-cyberscitec.2016.110 dblp:conf/dasc/JainMF16 fatcat:gmiz7uunpbaatjryzjiozj24om

Best-Effort FPGA Programming: A Few Steps Can Go a Long Way [article]

Jason Cong, Zhenman Fang, Yuchen Hao, Peng Wei, Cody Hao Yu, Chen Zhang, Peipei Zhou
2018 arXiv   pre-print
FPGA-based heterogeneous architectures provide programmers with the ability to customize their hardware accelerators for flexible acceleration of many workloads.  ...  We show that for a broad class of accelerator benchmarks from MachSuite, the proposed best-effort guideline improves the FPGA accelerator performance by 42-29,030x.  ...  accelerator design expert.  ... 
arXiv:1807.01340v1 fatcat:6ocpzvp2cvgkninbtyvvyk7yiu

DeCO: A DSP Block Based FPGA Accelerator Overlay with Low Overhead Interconnect

Abhishek Kumar Jain, Xiangwei Li, Pranjul Singhai, Douglas L. Maskell, Suhaib A. Fahmy
2016 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)  
However, achieving the desired performance often still requires detailed low-level design engineering effort that is difficult for non-experts.  ...  , long compilation times, and poor design productivity are major issues preventing the mainstream adoption of FPGA based accelerators in general purpose computing [2] .  ... 
doi:10.1109/fccm.2016.10 dblp:conf/fccm/JainLSMF16 fatcat:q5hlrqeoyrezll4o7qtsfnyp7e
« Previous Showing results 1 — 15 out of 167 results