A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Approximate FPGA-Based LSTMs Under Computation Time Constraints
[chapter]
2018
Lecture Notes in Computer Science
and pruning, along with a novel FPGA-based LSTM architecture. ...
method, while achieving an average of 25× higher accuracy under the same computation time constraints. ...
accuracy and (iii) run the LSTM under a time constraint with increasing accuracy as a function of computation time budget. ...
doi:10.1007/978-3-319-78890-6_1
fatcat:p2q7b7snnnghrkfet43cqjlj6u
Approximate LSTMs for Time-Constrained Inference: Enabling Fast Reaction in Self-Driving Cars
[article]
2019
arXiv
pre-print
The proposed methodology enables mission-critical systems to make informed decisions even in early stages of the computation, based on approximate LSTM inference, meeting their specifications on safety ...
In this paper, we introduce a progressive inference computing scheme that combines model pruning and computation restructuring leading to the best possible approximation of the result given the available ...
The proposed progressive inference methodology is initially compared with an FPGA-based baseline for LSTM inference to demonstrate its efficacy on making informed predictions under computation time constraints ...
arXiv:1905.00689v2
fatcat:wdx5cbijrfcifpf2hzeqmnx4hy
Deploying Deep Neural Networks in the Embedded Space
[article]
2018
arXiv
pre-print
Approximate FPGA-based LSTMs With a focus on the high-performance deployment of LSTMs under time-constrained settings, [19] presents a framework that comprises an approximate computing scheme together ...
Internally, the framework co-optimises the LSTM approximation and the hardware design in order to meet the computation time constraints. ...
arXiv:1806.08616v1
fatcat:52xugpvnkzeuloufnsfp673oo4
E-RNN: Design Optimization for Efficient Recurrent Neural Networks in FPGAs
[article]
2018
arXiv
pre-print
Experimental results on actual FPGA deployments show that E-RNN achieves a maximum energy efficiency improvement of 37.4× compared with ESE, and more than 2× compared with C-LSTM, under the same accuracy ...
Based on the two observations, we decompose E-RNN in two phases: Phase I on determining RNN model to reduce computation and storage subject to accuracy requirement, and Phase II on hardware implementations ...
Based on data dependency of the LSTM model, we propose to adopt multi-stage coarse-grained pipelining (abbreviated as CGPipe) techniques, to achieve maximum performance under the resource constraints. ...
arXiv:1812.07106v1
fatcat:7seb74wkh5aqpcbg5nytwyuicy
An FPGA-Based LSTM Acceleration Engine for Deep Learning Frameworks
2021
Electronics
Experimental results show that, compared with CPU and GPU, the FPGA-based acceleration engine can achieve performance improvement of 8.8 and 2.2 times and energy efficiency improvement of 16.9 and 9.6 ...
This paper proposes an implementation scheme of the LSTM network acceleration engine based on FPGA and further optimizes the implementation through fixed-point arithmetic, systolic array and lookup table ...
Conclusions In this paper, we implement an FPGA-based LSTM acceleration engine. ...
doi:10.3390/electronics10060681
fatcat:ctgai3la6nbirp5tyzotqtzjlq
Automatic RTL Generation Tool of FPGAs for DNNs
2022
Electronics
We also introduce a long short-term memory (LSTM)-based model to predict performance and generate a DNN model that suits the developer requirements automatically. ...
FPGAs possess the advantages of low latency and high energy efficiency, but the scarcity of FPGA development resources challenges the deployment of DNN-based edge devices. ...
We also used a zcu02, a zcu04, and a Kirin 970 CPU (with Arm computing library optimized) to achieve approximately 46 times the acceleration and 55 times the performance improvement. ...
doi:10.3390/electronics11030402
fatcat:mex52yf5brho3pwi4xubdauqte
Mapping Large LSTMs to FPGAs with Weight Reuse
2020
Journal of Signal Processing Systems
Field-Programmable Gate Arrays (FPGAs) have been used to speed up the inference of LSTMs, but FPGA-based LSTM accelerators are limited by the size of on-chip memory and the bandwidth of external memory ...
Compared with CPU and GPU implementations, our FPGA implementation is 23.7 and 1.3 times faster while consuming 208 and 19.2 times lower energy respectively, which shows that our approach enables large ...
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long ...
doi:10.1007/s11265-020-01549-8
fatcat:ph2yd2jlpff3dmkkmigtfig53y
C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs
[article]
2018
arXiv
pre-print
The previous work proposes to use a pruning based compression technique to reduce the model size and thus speedups the inference on FPGAs. ...
According to the experimental results, C-LSTM achieves up to 18.8X and 33.5X gains for performance and energy efficiency compared with the state-of-the-art LSTM implementation under the same experimental ...
L172004) and National Science Foundation under grants CNS #1704662 and CNS #1739748. We thank all the anonymous reviewers for their feedback. ...
arXiv:1803.06305v1
fatcat:st6r57l6grbu3jprxtqo7bo2q4
Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator
[article]
2021
arXiv
pre-print
To address this issue, we propose an FPGA-based hardware design to accelerate Bayesian LSTM-based RNNs. ...
Compared with GPU implementation, our FPGA-based design can achieve up to 10 times speedup with nearly 106 times higher energy efficiency. ...
ACKNOWLEDGMENT This work was supported in part by the United Kingdom EP-SRC under Grant EP/L016796/1, Grant EP/N031768/1, Grant EP/P010040/1, Grant EP/V028251/1 and Grant EP/S030069/1 and in part by the ...
arXiv:2106.06048v3
fatcat:o6ucskc6afabbgjdkokmmpqrmi
Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs
[article]
2018
arXiv
pre-print
with the baseline of IBM TrueNorth processor under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10. ...
For FPGA implementations on long short term memory (LSTM) networks, the proposed SWM-based LSTM can achieve up to 21X enhancement in performance and 33.5X gains in energy efficiency compared with the baseline ...
under same accuracy constraints using the data set of MNIST, SVHN, and CIFAR-10. ...
arXiv:1804.11239v1
fatcat:xzrhegowvvem3ausfk3bj6r52i
A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities
[article]
2019
arXiv
pre-print
We also compared the design and implementation of the accelerator based on FPGA under different devices and network models and compared it with the versions of CPU and GPU. ...
In this paper, we systematically investigate the neural network accelerator based on FPGA. ...
roof do not necessarily achieve higher performance under the constraints of memory bandwidth. ...
arXiv:1901.04988v2
fatcat:k662lznh5nho7i2jiwxtspmhdu
Artificial Neural Networks on FPGAs for Real-Time Energy Reconstruction of the ATLAS LAr Calorimeters
2021
Computing and Software for Big Science
Very good agreement between neural network implementations in FPGA and software based calculations is observed. ...
Real-time processing of digitized pulses sampled at 40 MHz is performed using field-programmable gate arrays (FPGAs). ...
While this could limit the robustness of Fig. 4 Single-cell application of LSTM based recurrent networks. The LSTM cell and its dense decoder are computed at every BC. ...
doi:10.1007/s41781-021-00066-y
fatcat:rdqooinzanfsrbhdfgcr3weabu
Accelerating Recurrent Neural Networks for Gravitational Wave Experiments
[article]
2021
arXiv
pre-print
When compared to other FPGA-based LSTM designs, our design can achieve about 4.92 to 12.4 times lower latency. ...
The proposed approach has been evaluated based on two LSTM models, targeting a ZYNQ 7045 FPGA and a U250 FPGA. ...
Results show latency reduction of up to 12.4 times over the existing FPGA-based LSTM design. ...
arXiv:2106.14089v1
fatcat:rrz7pzy7t5eoxiag2ioyzgxb5y
Identify a Spoofing Attack on an In-Vehicle CAN Bus Based on the Deep Features of an ECU Fingerprint Signal
2020
Smart Cities
The proposed RNN-LSTM model is accelerated on embedded Field-Programmable Gate Arrays (FPGA) to allow for real-time detection despite high computational complexity. ...
To effectively identify spoofing attacks, we propose the authentication of sender identities using a recurrent neural network with long short-term memory units (RNN-LSTM) based on the features of a fingerprint ...
Real-Time RNN-LSTM Acceleration Based on FPGA To optimize the RNN-LSTM classification, following constraints in terms of computation optimization and communication requirements, [26] need to be taken ...
doi:10.3390/smartcities3010002
fatcat:ctqwsq6gbnbkvaffmk33xwnjiu
A Post-training Quantization Method for the Design of Fixed-Point-Based FPGA/ASIC Hardware Accelerators for LSTM/GRU Algorithms
2022
Computational Intelligence and Neuroscience
The proposed quantization strategy is meant to be a detailed guideline toward the design of custom hardware accelerators for LSTM/GRU-based algorithms to be implemented on FPGA or ASIC devices using fixed-point ...
of customization of compute data paths and memory subsystems, which makes them take the maximum advantage from compression techniques for what concerns area, timing, and power consumption. ...
Acknowledgments is work has been co-funded by the European Space Agency under contract number 4000129792/20/NL. ...
doi:10.1155/2022/9485933
pmid:35602644
pmcid:PMC9117057
fatcat:ogrrkanppjaozc35rsfht6ibny
« Previous
Showing results 1 — 15 out of 431 results