Filters








311 Hits in 5.6 sec

Mapping Large LSTMs to FPGAs with Weight Reuse

Zhiqiang Que, Yongxin Zhu, Hongxiang Fan, Jiuxi Meng, Xinyu Niu, Wayne Luk
<span title="2020-07-09">2020</span> <i title="Springer Science and Business Media LLC"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/wgplegupdndx5o6decidr2va24" style="color: black;">Journal of Signal Processing Systems</a> </i> &nbsp;
We propose a novel hardware architecture to overcome data dependency and a new blocking-batching strategy to reuse the LSTM weights fetched from external memory to optimize the performance of systems with  ...  LSTM systems to be processed efficiently on FPGAs with high performance and low power consumption.  ...  To view a copy of this licence, visit http:// creativecommonshorg/licenses/by/4.0/.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s11265-020-01549-8">doi:10.1007/s11265-020-01549-8</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ph2yd2jlpff3dmkkmigtfig53y">fatcat:ph2yd2jlpff3dmkkmigtfig53y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201108170218/https://link.springer.com/content/pdf/10.1007/s11265-020-01549-8.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/7d/c6/7dc6a0de273f36900b2883a2b25aab6deb65ee09.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1007/s11265-020-01549-8"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> springer.com </button> </a>

Accelerating Recurrent Neural Networks for Gravitational Wave Experiments [article]

Zhiqiang Que, Erwei Wang, Umar Marikar, Eric Moreno, Jennifer Ngadiuba, Hamza Javed, Bartłomiej Borzyszkowski, Thea Aarrestad, Vladimir Loncar, Sioni Summers, Maurizio Pierini, Peter Y Cheung (+1 others)
<span title="2021-06-26">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
When compared to other FPGA-based LSTM designs, our design can achieve about 4.92 to 12.4 times lower latency.  ...  The proposed approach has been evaluated based on two LSTM models, targeting a ZYNQ 7045 FPGA and a U250 FPGA.  ...  It is running at 100MHz with 8 timesteps. The weights and input are 16 bits. The bias and LSTM cell status are both 32 bits to keep the accuracy.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.14089v1">arXiv:2106.14089v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/rrz7pzy7t5eoxiag2ioyzgxb5y">fatcat:rrz7pzy7t5eoxiag2ioyzgxb5y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210728210535/https://arxiv.org/pdf/2106.14089v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/5f/69/5f6986547cb4f5c12108ed31244b628028424d3f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.14089v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

H2H: Heterogeneous Model to Heterogeneous System Mapping with Computation and Communication Awareness [article]

Xinyi Zhang, Cong Hao, Peipei Zhou, Alex Jones, Jingtong Hu
<span title="2022-04-29">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We propose a novel H2H mapping algorithm with both computation and communication awareness; by slightly trading computation for communication, the system overall latency and energy consumption can be largely  ...  Therefore, a new problem emerges: heterogeneous model to heterogeneous system mapping (H2H).  ...  In a multi-FPGA system, each FPGA is equipped with a local DRAM, which can be utilized to store model weights and to buffer intermediate activations of two adjacent layers to reduce cross-FPGA data movement  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2204.13852v1">arXiv:2204.13852v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/i6mhgwauebdaxbxscvjrloctgq">fatcat:i6mhgwauebdaxbxscvjrloctgq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220505015751/https://arxiv.org/pdf/2204.13852v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/47/f7/47f725f9a4125d8561a5c22a797ab4a6002c5945.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2204.13852v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator [article]

Martin Ferianc, Zhiqiang Que, Hongxiang Fan, Wayne Luk, Miguel Rodrigues
<span title="2021-11-07">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
To address this issue, we propose an FPGA-based hardware design to accelerate Bayesian LSTM-based RNNs.  ...  Compared with GPU implementation, our FPGA-based design can achieve up to 10 times speedup with nearly 106 times higher energy efficiency.  ...  The weights and biases are mapped on-chip automatically into registers when the design is synthesized.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.06048v3">arXiv:2106.06048v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/o6ucskc6afabbgjdkokmmpqrmi">fatcat:o6ucskc6afabbgjdkokmmpqrmi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20211113045451/https://arxiv.org/pdf/2106.06048v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a2/5b/a25b5d25cbb6c0d901b49c87989ce84f1af2113a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.06048v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Remarn: A Reconfigurable Multi-threaded Multi-core Accelerator for Recurrent Neural Networks

Zhiqiang Que, Hiroki Nakahara, Hongxiang Fan, He Li, Jiuxi Meng, Kuen Hung Tsoi, Xinyu Niu, Eriko Nurvitadhi, Wayne Luk
<span title="2022-05-17">2022</span> <i title="Association for Computing Machinery (ACM)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/qfvmhupy5fb6tcrtb2orjpq5e4" style="color: black;">ACM Transactions on Reconfigurable Technology and Systems</a> </i> &nbsp;
When compared with a Tesla V100 GPU implementation, our design achieves 6.5 times better performance and 15.6 times higher power efficiency, showing that our approach contributes to high performance and  ...  energy-efficient FPGA-based multi-RNN inference designs for datacenters.  ...  Apart from batching, [54, 61, 63] introduce novel LSTM weights reuse schemes which utilizes the weights sharing characteristics in diferent timestep computations in one inference.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3534969">doi:10.1145/3534969</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/g2lj3qnv3bdsxb5mkmwlkoezpu">fatcat:g2lj3qnv3bdsxb5mkmwlkoezpu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220518132420/https://dl.acm.org/doi/pdf/10.1145/3534969" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/bc/7e/bc7e3fb53c7c36467ecfd09559b5ff32446818ae.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3534969"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Approximate LSTMs for Time-Constrained Inference: Enabling Fast Reaction in Self-Driving Cars [article]

Alexandros Kouris, Stylianos I. Venieris, Michail Rizakis, Christos-Savvas Bouganis
<span title="2019-10-30">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Our experiments on a state-of-the-art driving model for autonomous vehicle navigation demonstrate that the proposed approach can yield outputs with similar quality of result compared to a faithful LSTM  ...  However, the high computational and memory demands of LSTMs introduce challenges in their deployment on latency-critical systems such as self-driving cars which are equipped with limited computational  ...  The goal is to generate an optimised hardware mapping of a given LSTM on a target FPGA, tailored to The concept of progressive inference: Conventional and target behaviour of time-constrained AI systems  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1905.00689v2">arXiv:1905.00689v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wdx5cbijrfcifpf2hzeqmnx4hy">fatcat:wdx5cbijrfcifpf2hzeqmnx4hy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200902034725/https://arxiv.org/pdf/1905.00689v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/55/7e/557eb724e281a17dc76545bad74eab0718078cca.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1905.00689v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

DNN Dataflow Choice Is Overrated [article]

Xuan Yang, Mingyu Gao, Jing Pu, Ankita Nayak, Qiaoyi Liu, Steven Emberton Bell, Jeff Ou Setter, Kaidi Cao, Heonjae Ha, Christos Kozyrakis, Mark Horowitz
<span title="2018-09-10">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Compared with Eyeriss system, it achieves up to 4.2X energy improvement for Convolutional Neural Networks (CNNs), 1.6X and 1.8X improvement for Long Short-Term Memories (LSTMs) and multi-layer perceptrons  ...  Many DNN accelerators have been proposed and built using different microarchitectures and program mappings.  ...  Google's TPU also used a simple systolic dataflow on a large 2D array of PEs, which could also be used for MLPs and LSTMs in addition to CNNs [7] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1809.04070v1">arXiv:1809.04070v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/dtrnyaf6sfcjhfaiq4duvaie4i">fatcat:dtrnyaf6sfcjhfaiq4duvaie4i</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200906205638/https://arxiv.org/pdf/1809.04070v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/57/e2/57e22ca24dd69ea467edb01c0694d741c06514c7.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1809.04070v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Structured Weight Matrices-Based Hardware Accelerators in Deep Neural Networks: FPGAs and ASICs [article]

Caiwen Ding, Ao Ren, Geng Yuan, Xiaolong Ma, Jiayu Li, Ning Liu, Bo Yuan, Yanzhi Wang
<span title="2018-03-28">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
For FPGA implementations on long short term memory (LSTM) networks, the proposed SWM-based LSTM can achieve up to 21X enhancement in performance and 33.5X gains in energy efficiency compared with the baseline  ...  In this work, to address the increasing demands in computational capability and memory requirement, we propose structured weight matrices (SWM)-based compression techniques for both field programmable  ...  Data-path optimization technique [8] have also been studied to map a limited number of Processing elements (PEs) on FPGA and reuse the mapped PEs by iterating data through them.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1804.11239v1">arXiv:1804.11239v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/xzrhegowvvem3ausfk3bj6r52i">fatcat:xzrhegowvvem3ausfk3bj6r52i</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191018113316/https://arxiv.org/pdf/1804.11239v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/36/84/36841b38d7fd7162423b9d453cc74529268bda8d.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1804.11239v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

hls4ml: deploying deep learning on FPGAs for L1 trigger and Data Acquisition

Nhan Viet Tran, Vladimir Loncar
<span title="2019-11-07">2019</span> <i title="Zenodo"> Zenodo </i> &nbsp;
We map out resource usage and latency versus network architectures, to identify the typical problem complexity that hls4ml could deal with.  ...  There is great potential to improve trigger and DAQ performance with it. However, the exploration of such techniques within the field in low latency/power FPGAs has just begun.  ...  Kazi Asif Ahmed Fuad Recurrent neural networks -Simple RNN, LSTM, GRU Two implementations: -Fully unrolled: -Latency optimized with II=1 -Large resource usage -Static: same resources used for weights  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5281/zenodo.3598988">doi:10.5281/zenodo.3598988</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/jaqavik2cjbkfic4q2ydho3dyq">fatcat:jaqavik2cjbkfic4q2ydho3dyq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200208091534/https://zenodo.org/record/3598989/files/CHEP2019_176.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/60/45/604516fbe218e15215ddc556829a2bd926a0f2e5.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5281/zenodo.3598988"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> zenodo.org </button> </a>

Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator [article]

Tian Zhao, Yaqi Zhang, Kunle Olukotun
<span title="2019-09-26">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
to Microsoft Brainwave implementation on a Stratix 10 FPGA.  ...  We evaluate our optimization strategy on such abstraction with DeepBench using a configurable spatial accelerator.  ...  Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1909.13654v1">arXiv:1909.13654v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6w2ccglyanfmrohqler55k2pzu">fatcat:6w2ccglyanfmrohqler55k2pzu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200829102251/https://arxiv.org/pdf/1909.13654v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/6c/8f/6c8f981cc281cc6bf504348970ea349c6c201abb.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1909.13654v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

A Survey on the Optimization of Neural Network Accelerators for Micro-AI On-Device Inference

Arnab Neelim Mazumder, Jian Meng, Hasib-Al Rashid, Utteja Kallakuri, Xin Zhang, Jae-sun Seo, Tinoosh Mohsenin
<span title="">2021</span> <i title="Institute of Electrical and Electronics Engineers (IEEE)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/cuk7yvxow5gknlxza65us4b7fi" style="color: black;">IEEE Journal on Emerging and Selected Topics in Circuits and Systems</a> </i> &nbsp;
The efficacy of DNNs coincides with the fact that they can provide state-ofthe-art inference accuracy for these applications.  ...  techniques, and the realization of the micro-AI models on resource-constrained hardware and different design considerations associated with it.  ...  Authors in [1] use these reuse techniques as, feature map reuse, filter reuse and convolutional reuse.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/jetcas.2021.3129415">doi:10.1109/jetcas.2021.3129415</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/nknpy4eernaeljz2hpqafe7sja">fatcat:nknpy4eernaeljz2hpqafe7sja</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220107015545/https://ieeexplore.ieee.org/ielx7/5503868/9647029/09627710.pdf?tp=&amp;arnumber=9627710&amp;isnumber=9647029&amp;ref=" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/0a/a6/0aa621e8785a85738ef97b491fbe55675a4b1ae0.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/jetcas.2021.3129415"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>

Accelerating Neural Network Inference on FPGA-Based Platforms—A Survey

Ran Wu, Xinmin Guo, Jian Du, Junbao Li
<span title="">2021</span> <i title="MDPI AG"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ikdpfme5h5egvnwtvvtjrnntyy" style="color: black;">Electronics</a> </i> &nbsp;
We summarize how to design a technical route for practical applications based on these strategies. Challenges in the path are discussed to provide guidance for future work.  ...  In this paper, we research neural networks which are involved in the acceleration on FPGA-based platforms.  ...  Researchers avoid mapping the large-weight synapses to the abnormal memristors by deriving a weight-memristor mapping for variations and defects [65] .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/electronics10091025">doi:10.3390/electronics10091025</a> <a target="_blank" rel="external noopener" href="https://doaj.org/article/92e7eb4228a44c6387f846a1203529d0">doaj:92e7eb4228a44c6387f846a1203529d0</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/2xa7dv5hsjbczpvc4w6acdehwu">fatcat:2xa7dv5hsjbczpvc4w6acdehwu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210611171533/https://res.mdpi.com/d_attachment/electronics/electronics-10-01025/article_deploy/electronics-10-01025-v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/99/9d/999df34d012135868b1e191f8239ae2b1d6a8c2c.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.3390/electronics10091025"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> mdpi.com </button> </a>

Fast inference of deep neural networks in FPGAs for particle physics

J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadiuba, M. Pierini, R. Rivera, N. Tran, Z. Wu
<span title="2018-07-27">2018</span> <i title="IOP Publishing"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/k4sddk2vsfc4lhs45aqu6spxoy" style="color: black;">Journal of Instrumentation</a> </i> &nbsp;
We map out FPGA resource usage and latency versus neural network hyperparameters to identify the problems in particle physics that would benefit from performing neural network inference with FPGAs.  ...  Recent results at the Large Hadron Collider (LHC) have pointed to enhanced physics capabilities through the improvement of the real-time event processing techniques.  ...  Acknowledgements We would like to thank Evan Coleman, Marat Freytsis, and Andreas Hinzmann for assistance in producing datasets in a related work.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1088/1748-0221/13/07/p07027">doi:10.1088/1748-0221/13/07/p07027</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/bq7km3h5gbbe3l5sltzhkn3eeq">fatcat:bq7km3h5gbbe3l5sltzhkn3eeq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190219192228/http://pdfs.semanticscholar.org/1a10/df27818d6fc021017c97beaaae756c8ba120.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/1a/10/1a10df27818d6fc021017c97beaaae756c8ba120.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1088/1748-0221/13/07/p07027"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> iop.org </button> </a>

A Unified FPGA Virtualization Framework for General-Purpose Deep Neural Networks in the Cloud

Shulin Zeng, Guohao Dai, Hanbo Sun, Jun Liu, Shiyao Li, Guangjun Ge, Kai Zhong, Kaiyuan Guo, Yu Wang, Huazhong Yang
<span title="2022-09-30">2022</span> <i title="Association for Computing Machinery (ACM)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/qfvmhupy5fb6tcrtb2orjpq5e4" style="color: black;">ACM Transactions on Reconfigurable Technology and Systems</a> </i> &nbsp;
On the other hand, current cloud-based DNN accelerators have excessive compilation overhead, especially when scaling out to multi-FPGA systems for multi-tenant sharing, leading to unacceptable compilation  ...  Finally, the extensive experimental results show that the proposed virtualized solutions achieve up to 3.12× and 6.18× higher throughput in the private cloud compared with the static CNN and RNN baseline  ...  This means that data reuse only exists in the column dimension of the weight matrix.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3480170">doi:10.1145/3480170</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/kgrhiohisvcdxm635l3wykgdri">fatcat:kgrhiohisvcdxm635l3wykgdri</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220303165341/https://dl.acm.org/doi/pdf/10.1145/3480170" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/b4/01/b40122ccb8f044fe6978a2e9380fd427ebe88add.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/3480170"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>

Accelerating Deep Neural Networks implementation: A survey

Meriam Dhouibi, Ahmed Karim Ben Salem, Afef Saidi, Slim Ben Saoud
<span title="2021-03-10">2021</span> <i title="Institution of Engineering and Technology (IET)"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/34xrdbeizvba5cnrxymhir5cxi" style="color: black;">IET Computers &amp; Digital Techniques</a> </i> &nbsp;
Field Programmable Gate Arrays (FPGAs) are promising platforms for the deployment of large-scale DNN which seek to reach a balance between the above objectives.  ...  Finally, a survey of research works aiming to accelerate the implementation of DNN models on FPGAs is provided.  ...  [59] , LSTM models (Google LSTM and Small LSTM) with 16bit fixed-point data type were implemented on two FPGA platforms resulting in only 1.23% precision degradation.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1049/cdt2.12016">doi:10.1049/cdt2.12016</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/3kl4j5ztl5eahmgv7vetu2egay">fatcat:3kl4j5ztl5eahmgv7vetu2egay</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210715065134/https://ietresearch.onlinelibrary.wiley.com/doi/pdfdirect/10.1049/cdt2.12016" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/5a/a1/5aa18352e17f23f63401fee5f83832e6f7fad537.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1049/cdt2.12016"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 311 results