431 Hits in 3.4 sec

Approximate FPGA-Based LSTMs Under Computation Time Constraints [chapter]

Michalis Rizakis, Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis
2018 Lecture Notes in Computer Science  
and pruning, along with a novel FPGA-based LSTM architecture.  ...  method, while achieving an average of 25× higher accuracy under the same computation time constraints.  ...  accuracy and (iii) run the LSTM under a time constraint with increasing accuracy as a function of computation time budget.  ... 
doi:10.1007/978-3-319-78890-6_1 fatcat:p2q7b7snnnghrkfet43cqjlj6u

Approximate LSTMs for Time-Constrained Inference: Enabling Fast Reaction in Self-Driving Cars [article]

Alexandros Kouris, Stylianos I. Venieris, Michail Rizakis, Christos-Savvas Bouganis
2019 arXiv   pre-print
The proposed methodology enables mission-critical systems to make informed decisions even in early stages of the computation, based on approximate LSTM inference, meeting their specifications on safety  ...  In this paper, we introduce a progressive inference computing scheme that combines model pruning and computation restructuring leading to the best possible approximation of the result given the available  ...  The proposed progressive inference methodology is initially compared with an FPGA-based baseline for LSTM inference to demonstrate its efficacy on making informed predictions under computation time constraints  ... 
arXiv:1905.00689v2 fatcat:wdx5cbijrfcifpf2hzeqmnx4hy

Deploying Deep Neural Networks in the Embedded Space [article]

Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis
2018 arXiv   pre-print
Approximate FPGA-based LSTMs With a focus on the high-performance deployment of LSTMs under time-constrained settings, [19] presents a framework that comprises an approximate computing scheme together  ...  Internally, the framework co-optimises the LSTM approximation and the hardware design in order to meet the computation time constraints.  ... 
arXiv:1806.08616v1 fatcat:52xugpvnkzeuloufnsfp673oo4

E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs [article]

Zhe Li, Caiwen Ding, Siyue Wang, Wujie Wen, Youwei Zhuo, Chang Liu, Qinru Qiu, Wenyao Xu, Xue Lin, Xuehai Qian, Yanzhi Wang
2018 arXiv   pre-print
Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4× compared with ESE, and more than 2× compared with C-LSTM, under the same accuracy  ...  Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations  ...  Based on data dependency of the LSTM model, we propose to adopt multi-stage coarse-grained pipelining (abbreviated as CGPipe) techniques, to achieve maximum performance under the resource constraints.  ... 
arXiv:1812.07106v1 fatcat:7seb74wkh5aqpcbg5nytwyuicy

An FPGA-Based LSTM Acceleration Engine for Deep Learning Frameworks

Dazhong He, Junhua He, Jun Liu, Jie Yang, Qing Yan, Yang Yang
2021 Electronics  
Experimental results show that, compared with CPU and GPU, the FPGA-based acceleration engine can achieve performance improvement of 8.8 and 2.2 times and energy efficiency improvement of 16.9 and 9.6  ...  This paper proposes an implementation scheme of the LSTM network acceleration engine based on FPGA and further optimizes the implementation through fixed-point arithmetic, systolic array and lookup table  ...  Conclusions In this paper, we implement an FPGA-based LSTM acceleration engine.  ... 
doi:10.3390/electronics10060681 fatcat:ctgai3la6nbirp5tyzotqtzjlq

Automatic RTL Generation Tool of FPGAs for DNNs

Seojin Jang, Wei Liu, Sangun Park, Yongbeom Cho
2022 Electronics  
We also introduce a long short-term memory (LSTM)-based model to predict performance and generate a DNN model that suits the developer requirements automatically.  ...  FPGAs possess the advantages of low latency and high energy efficiency, but the scarcity of FPGA development resources challenges the deployment of DNN-based edge devices.  ...  We also used a zcu02, a zcu04, and a Kirin 970 CPU (with Arm computing library optimized) to achieve approximately 46 times the acceleration and 55 times the performance improvement.  ... 
doi:10.3390/electronics11030402 fatcat:mex52yf5brho3pwi4xubdauqte

Mapping Large LSTMs to FPGAs with Weight Reuse

Zhiqiang Que, Yongxin Zhu, Hongxiang Fan, Jiuxi Meng, Xinyu Niu, Wayne Luk
2020 Journal of Signal Processing Systems  
Field-Programmable Gate Arrays (FPGAs) have been used to speed up the inference of LSTMs, but FPGA-based LSTM accelerators are limited by the size of on-chip memory and the bandwidth of external memory  ...  Compared with CPU and GPU implementations, our FPGA implementation is 23.7 and 1.3 times faster while consuming 208 and 19.2 times lower energy respectively, which shows that our approach enables large  ...  Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long  ... 
doi:10.1007/s11265-020-01549-8 fatcat:ph2yd2jlpff3dmkkmigtfig53y

C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs [article]

Shuo Wang, Zhe Li, Caiwen Ding, Bo Yuan, Yanzhi Wang, Qinru Qiu, Yun Liang
2018 arXiv   pre-print
The previous work proposes to use a pruning based compression technique to reduce the model size and thus speedups the inference on FPGAs.  ...  According to the experimental results, C-LSTM achieves up to 18.8X and 33.5X gains for performance and energy efficiency compared with the state-of-the-art LSTM implementation under the same experimental  ...  L172004) and National Science Foundation under grants CNS #1704662 and CNS #1739748. We thank all the anonymous reviewers for their feedback.  ... 
arXiv:1803.06305v1 fatcat:st6r57l6grbu3jprxtqo7bo2q4

Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator [article]

Martin Ferianc, Zhiqiang Que, Hongxiang Fan, Wayne Luk, Miguel Rodrigues
2021 arXiv   pre-print
To address this issue, we propose an FPGA-based hardware design to accelerate Bayesian LSTM-based RNNs.  ...  Compared with GPU implementation, our FPGA-based design can achieve up to 10 times speedup with nearly 106 times higher energy efficiency.  ...  ACKNOWLEDGMENT This work was supported in part by the United Kingdom EP-SRC under Grant EP/L016796/1, Grant EP/N031768/1, Grant EP/P010040/1, Grant EP/V028251/1 and Grant EP/S030069/1 and in part by the  ... 
arXiv:2106.06048v3 fatcat:o6ucskc6afabbgjdkokmmpqrmi

Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs [article]

Caiwen Ding, Ao Ren, Geng Yuan, Xiaolong Ma, Jiayu Li, Ning Liu, Bo Yuan, Yanzhi Wang
2018 arXiv   pre-print
with the baseline of IBM TrueNorth processor under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10.  ...  For FPGA implementations on long short term memory (LSTM) networks, the proposed SWM-based LSTM can achieve up to 21X enhancement in performance and 33.5X gains in energy efficiency compared with the baseline  ...  under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10.  ... 
arXiv:1804.11239v1 fatcat:xzrhegowvvem3ausfk3bj6r52i

A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities [article]

Teng Wang, Chao Wang, Xuehai Zhou, Huaping Chen
2019 arXiv   pre-print
We also compared the design and implementation of the accelerator based on FPGA under different devices and network models and compared it with the versions of CPU and GPU.  ...  In this paper, we systematically investigate the neural network accelerator based on FPGA.  ...  roof do not necessarily achieve higher performance under the constraints of memory bandwidth.  ... 
arXiv:1901.04988v2 fatcat:k662lznh5nho7i2jiwxtspmhdu

Artificial Neural Networks on FPGAs for Real-Time Energy Reconstruction of the ATLAS LAr Calorimeters

Georges Aad, Anne-Sophie Berthold, Thomas Calvet, Nemer Chiedde, Etienne Marie Fortin, Nick Fritzsche, Rainer Hentges, Lauri Antti Olavi Laatu, Emmanuel Monnier, Arno Straessner, Johann Christoph Voigt
2021 Computing and Software for Big Science  
Very good agreement between neural network implementations in FPGA and software based calculations is observed.  ...  Real-time processing of digitized pulses sampled at 40 MHz is performed using field-programmable gate arrays (FPGAs).  ...  While this could limit the robustness of Fig. 4 Single-cell application of LSTM based recurrent networks. The LSTM cell and its dense decoder are computed at every BC.  ... 
doi:10.1007/s41781-021-00066-y fatcat:rdqooinzanfsrbhdfgcr3weabu

Accelerating Recurrent Neural Networks for Gravitational Wave Experiments [article]

Zhiqiang Que, Erwei Wang, Umar Marikar, Eric Moreno, Jennifer Ngadiuba, Hamza Javed, Bartłomiej Borzyszkowski, Thea Aarrestad, Vladimir Loncar, Sioni Summers, Maurizio Pierini, Peter Y Cheung (+1 others)
2021 arXiv   pre-print
When compared to other FPGA-based LSTM designs, our design can achieve about 4.92 to 12.4 times lower latency.  ...  The proposed approach has been evaluated based on two LSTM models, targeting a ZYNQ 7045 FPGA and a U250 FPGA.  ...  Results show latency reduction of up to 12.4 times over the existing FPGA-based LSTM design.  ... 
arXiv:2106.14089v1 fatcat:rrz7pzy7t5eoxiag2ioyzgxb5y

Identify a Spoofing Attack on an In-Vehicle CAN Bus Based on the Deep Features of an ECU Fingerprint Signal

Yang, Duan, Tehranipoor
2020 Smart Cities  
The proposed RNN-LSTM model is accelerated on embedded Field-Programmable Gate Arrays (FPGA) to allow for real-time detection despite high computational complexity.  ...  To effectively identify spoofing attacks, we propose the authentication of sender identities using a recurrent neural network with long short-term memory units (RNN-LSTM) based on the features of a fingerprint  ...  Real-Time RNN-LSTM Acceleration Based on FPGA To optimize the RNN-LSTM classification, following constraints in terms of computation optimization and communication requirements, [26] need to be taken  ... 
doi:10.3390/smartcities3010002 fatcat:ctqwsq6gbnbkvaffmk33xwnjiu

A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms

Emilio Rapuano, Tommaso Pacini, Luca Fanucci, Suneet Kumar Gupta
2022 Computational Intelligence and Neuroscience  
The proposed quantization strategy is meant to be a detailed guideline toward the design of custom hardware accelerators for LSTM/GRU-based algorithms to be implemented on FPGA or ASIC devices using fixed-point  ...  of customization of compute data paths and memory subsystems, which makes them take the maximum advantage from compression techniques for what concerns area, timing, and power consumption.  ...  Acknowledgments is work has been co-funded by the European Space Agency under contract number 4000129792/20/NL.  ... 
doi:10.1155/2022/9485933 pmid:35602644 pmcid:PMC9117057 fatcat:ogrrkanppjaozc35rsfht6ibny
« Previous Showing results 1 — 15 out of 431 results