1,309 Hits in 7.0 sec

Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing [article]

Akshay Dua, Yixing Li, Fengbo Ren
2020 arXiv   pre-print
This paper presents Systolic-CNN, an OpenCL-defined scalable, run-time-flexible FPGA accelerator architecture, optimized for accelerating the inference of various convolutional neural networks (CNNs) in  ...  Systolic-CNN adopts a highly pipelined and paralleled 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs.  ...  INTRODUCTION FPGAs offer superior hardware flexibility and energy efficiency that have attracted many researchers and developers to use FPGAs for accelerating convolutional neural network (CNN) inference  ... 
arXiv:2012.03177v1 fatcat:h5alzshjybhv7kmpmeb46an3qm

ORIGAMI: A Heterogeneous Split Architecture for In-Memory Acceleration of Learning [article]

Hajar Falahati and Pejman Lotfi-Kamran and Mohammad Sadrosadati and Hamid Sarbazi-Azad
2019 arXiv   pre-print
,FPGA,GPU,TPU,etc.) to utilize bandwidth without violating strict area and power budgets.  ...  These compute engines constitute heterogeneous accelerators integrated on logic layer of a 3D-stacked memory. Combination of these compute engines can execute any type of ML algorithms.  ...  Although in-memory accelerators provide high memory bandwidth and consume less energy, they suffer from lack of generality or efficiency.  ... 
arXiv:1812.11473v2 fatcat:vhz3bpqfe5h6nbn2ymh7n6rtxi

Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA [article]

Cheng Fu, Shilin Zhu, Hao Su, Ching-En Lee, Jishen Zhao
2018 arXiv   pre-print
We believe our deployment of BNN on FPGA leads to a promising future of running deep learning models on mobile devices.  ...  Thus there does exist redundancy that can be exploited to further reduce the amount of on-chip computations.  ...  We thank Tien-Pei Chen and Po-Wei Chou for the guidance on FPGA HLS problems. We would also like to thank Wei Shu and Pingping Shao for many helpful discussions.  ... 
arXiv:1810.02068v1 fatcat:6ttblslosnfu5bgpajuvhs3hzy

Accelerating CNN inference on FPGAs: A Survey [article]

Kamel Abdelouahab and Maxime Pelcat and Jocelyn Serot and François Berry
2018 arXiv   pre-print
This paper presents a state-of-the-art of CNN inference accelerators over FPGAs. The computational workloads, their parallelism and the involved memory accesses are analyzed.  ...  The methods and tools investigated in this survey represent the recent trends in FPGA CNN inference accelerators and will fuel the future advances on efficient hardware deep learning.  ...  In particular, we show that current FPGA-based accelerators for CNNs rely on one (or a combination) of three main optimizations to e ciently infer CNNs.  ... 
arXiv:1806.01683v1 fatcat:ftjdwzeuizbbjjytagb7mimrha

An FPGA-Based Hardware Accelerator for CNNs Inference on Board Satellites: Benchmarking with Myriad 2-Based Solution for the CloudScout Case Study

Emilio Rapuano, Gabriele Meoni, Tommaso Pacini, Gianmarco Dinelli, Gianluca Furano, Gianluca Giuffrida, Luca Fanucci
2021 Remote Sensing  
The current work provides a benchmark between the Myriad 2 and our custom hardware accelerator designed for Field Programmable Gate Arrays (FPGAs).  ...  In particular, the application of Deep Learning (DL) techniques on board Earth Observation (EO) satellites might lead to numerous advantages in terms of mitigation of downlink bandwidth constraints, costs  ...  FPGA Accelerators: State of the Art FPGA-based hardware accelerators for CNNs are typically complex to implement due to the high degree of manual optimization required to fully implement the network in  ... 
doi:10.3390/rs13081518 fatcat:6g4jbjtfzzhnpigylcjnjdfd5u

An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators

Seyed Morteza Nabavinejad, Mohammad Baharloo, Kun-Chih Chen, Maurizio Palesi, Tim Kogel, Masoumeh Ebrahimi
2020 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
Currently, a large body of research aims to find an efficient on-chip interconnection to achieve low-power and high-bandwidth DNN computing.  ...  ., in/near-memory processing) for the DNN accelerator design. This paper systematically investigates the interconnection networks in modern DNN accelerator designs.  ...  partial sum) and one memory write (for the updated partial sum).  ... 
doi:10.1109/jetcas.2020.3022920 fatcat:idqitgwnrnegbd4dhrly3xsxbi

Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity

Shijie Cao, Chen Zhang, Zhuliang Yao, Wencong Xiao, Lanshun Nie, Dechen Zhan, Yunxin Liu, Ming Wu, Lintao Zhang
2019 Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '19  
Implemented on Intel Arria-10 FPGA, the BBS accelerator can achieve 750.9 GOPs on sparse LSTM networks with a batch size of 1.  ...  Finally, we design an FPGA accelerator that takes advantage of BBS to eliminate irregular computation and memory accesses.  ...  ACKNOWLEDGEMENTS We would like to thank Ningyi Xu, Wenqiang Wang, Bojie Li and Yun Wang for all technical discussions and valuable suggestions on improving this paper.  ... 
doi:10.1145/3289602.3293898 dblp:conf/fpga/CaoZYXNZLWZ19 fatcat:gac5jdovxngufebpaxa3wdcele

A flexible FPGA accelerator for convolutional neural networks [article]

Kingshuk Majumder, Uday Bondhugula
2019 arXiv   pre-print
In this paper, we propose a CNN accelerator design for inference that is able to exploit all forms of reuse available to minimize off-chip memory access while increasing utilization of available resources  ...  Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound.  ...  We would also like to acknowledge Shubham Nema for integrating the FPGA runtime prototype into Tensorflow.  ... 
arXiv:1912.07284v2 fatcat:7yvo7y2bjvaj3akevfmie4yqva

Pareto Optimal Design Space Exploration for Accelerated CNN on FPGA

Enrico Reggiani, Marco Rabozzi, Anna Maria Nestorov, Alberto Scolari, Luca Stornaiuolo, Marco Santambrogio
2019 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
While Graphics Processing Units (GPUs) are predominantly used for training, solutions for inference often rely on Field Programmable Gate Arrays (FPGAs) since they are more flexible and cost-efficient  ...  Guided from the DSE configurations of the AlexNet network, we quickly identified a candidate design for a Xilinx Virtex-7 XC7VX485T FPGA and achieved a peak throughput of 4.05 ms per image, while we measured  ...  To the best of our knowledge, this is the first work in the literature to exploit time-sharing for optimizing the CNN inference on FPGA.  ... 
doi:10.1109/ipdpsw.2019.00028 dblp:conf/ipps/ReggianiRNSSS19 fatcat:otzpbg2zozdpfgd5dhs3lh5364

Accelerating Deep Neural Networks implementation: A survey

Meriam Dhouibi, Ahmed Karim Ben Salem, Afef Saidi, Slim Ben Saoud
2021 IET Computers & Digital Techniques  
Finally, a survey of research works aiming to accelerate the implementation of DNN models on FPGAs is provided.  ...  However, it is necessary to guarantee the best performance when designing hardware accelerators for DL applications to run at full speed, despite the constraints of low power, high accuracy and throughput  ...  [98] , the authors designed DLAU, an accelerator architecture for large-scale DNNs by exploiting data reuse in order to reduce the memory bandwidth requirements.  ... 
doi:10.1049/cdt2.12016 fatcat:3kl4j5ztl5eahmgv7vetu2egay

NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs [article]

Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, Davide Rossi, Luigi Raffo, Luca Benini
2017 arXiv   pre-print
This work presents NEURAghe, a flexible and efficient hardware/software solution for the acceleration of CNNs on Zynq SoCs.  ...  This methodology opens the way for cooperative heterogeneous computing: while the accelerator takes care of the bulk of the CNN workload, the ARM cores can seamlessly execute hard-to-accelerate parts of  ...  Most extreme approaches to quantization exploit ternary [28] or binary [33] neural-networks accelerators for FPGA.  ... 
arXiv:1712.00994v1 fatcat:s2e2eaafpbcffnutlzktggcooe

An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks

Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Muhammad Shafique, Guido Masera, Maurizio Martina
2020 Future Internet  
Their ability to go beyond human precision has made these networks a milestone in the history of AI.  ...  Deep Neural Networks (DNNs) are nowadays a common practice in most of the Artificial Intelligence (AI) applications.  ...  Thus each MAC requires three memory accesses: two for the factors and one for the writeback of the product.  ... 
doi:10.3390/fi12070113 fatcat:heyq4l3rkrdc5p55xdbhsh4jxu

EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference

Chang Gao, Antonio Rios-Navarro, Xi Chen, Shih-Chii Liu, Tobi Delbruck
2020 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
We propose a lightweight Gated Recurrent Unit (GRU)-based RNN accelerator called EdgeDRNN that is optimized for low-latency edge RNN inference with batch size of 1.  ...  Low-latency, low-power portable recurrent neural network (RNN) accelerators offer powerful inference capabilities for real-time applications such as IoT, robotics, and human-machine interaction.  ...  Meanwhile, EdgeDRNN is designed to match the external memory bandwidth available on any FPGA platform with external DRAM.  ... 
doi:10.1109/jetcas.2020.3040300 fatcat:6po265l6rrh4zd35ymyzvj3mce

An FPGA Overlay for CNN Inference with Fine-grained Flexible Parallelism

Ziaul Choudhury, Shashwat Shrivastava, Lavanya Ramapantulu, Suresh Purini
2022 ACM Transactions on Architecture and Code Optimization (TACO)  
In this article, we propose an FPGA overlay for efficient processing of CNNs that can be scaled based on the available compute and memory resources of the FPGA.  ...  FPGA-based hardware designers address the structural variability issue by generating a network-specific accelerator for a single network or a class of networks.  ...  As the range of supported networks grows so does the complexity of managing the different accelerators on the FPGA.  ... 
doi:10.1145/3519598 fatcat:7twwr7yn4jbwpdtwnuukgztfs4

Optimizing Temporal Convolutional Network inference on FPGA-based accelerators [article]

Marco Carreras, Gianfranco Deriu, Luigi Raffo, Luca Benini, Paolo Meloni
2020 arXiv   pre-print
While FPGA-based inference accelerators for classic CNNs are widespread, literature is lacking in a quantitative evaluation of their usability on inference for TCN models.  ...  In this paper we present such an evaluation, considering a CNN accelerator with specific features supporting TCN kernels as a reference and a set of state-of-the-art TCNs as a benchmark.  ...  To the best of our knowledge, there are no published FPGA-based accelerators tuned to speed-up inference for generic Temporal Convolutional Networks.  ... 
arXiv:2005.03775v1 fatcat:wexyiv7pnvbtpf2ldipgnpqdga
« Previous Showing results 1 — 15 out of 1,309 results