Filters








169 Hits in 4.7 sec

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, Zhiru Zhang
2017 Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays - FPGA '17  
However, large GPUs outperform modern FPGAs in throughput, and the existence of compatible deep learning frameworks give GPUs a significant advantage in programmability.  ...  The accelerator outperforms existing FPGA-based CNN accelerators in GOPS as well as energy and resource efficiency.  ...  The Tesla K40 GPU used for this research was donated by the NVIDIA Corporation.  ... 
doi:10.1145/3020078.3021741 fatcat:6yzshksxx5eanowroqnvez2kze

Combining Deep Learning Accelerators and Graphics Processing Unit for Efficient Computing

M.O Agbaje, Daniel O, O Mosinmiloluwa
2020 Zenodo  
Hardware used in implementing artificial neural networks is vital as it has a major role to play in the speed and efficiency of the whole system.  ...  It is also a stated fact that the artificial intelligence industry is at crossroads for which processor (Deep Learning Accelerators and Graphics Processing Unit) best fits the portfolio for the most powerful  ...  Can FPGAs beat GPUs in accelerating next-generation deep neural networks?. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (pp. 5-14). 2.  ... 
doi:10.5281/zenodo.4030329 fatcat:stwiww2ugzbaha3hf62pqwkjp4

Deep Neural Network Approximation for Custom Hardware: Where We've Been, Where We're Going [article]

Erwei Wang, James J. Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter Y. K. Cheung, George A. Constantinides
2019 arXiv   pre-print
Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms of both throughput and energy efficiency.  ...  Deep neural networks have proven to be particularly effective in visual and audio recognition tasks.  ...  Intel Cascade Lake CPUs provide so-called Vector Neural Network Instructions in 16-and eight-bit formats [61] , while Nvidia Turing GPUs support TensorRT, a deep learning platform integrable with TensorFlow  ... 
arXiv:1901.06955v3 fatcat:rkgo2oisdrgv3dtnbtlldlkpba

FPGA Acceleration of Recurrent Neural Network Based Language Model

Sicheng Li, Chunpeng Wu, Hai Li, Boxun Li, Yu Wang, Qinru Qiu
2015 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines  
However, the use of RNNLM has been greatly hindered for the high computation cost in training. This work presents an FPGA implementation framework for RNNLM training acceleration.  ...  Recurrent neural network (RNN) based language model (RNNLM) is a biologically inspired model for natural language processing.  ...  Deep neural networks (DNNs) also demonstrated great potential in the domain of language models [10] .  ... 
doi:10.1109/fccm.2015.50 dblp:conf/fccm/LiWLLWQ15 fatcat:dk66yqbdfvc2niu2acs3rwfn3q

GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware [article]

Ananda Samajdar, Parth Mannan, Kartikay Garg, Tushar Krishna
2018 arXiv   pre-print
EvE can evolve the topology and weights of neural networks completely in hardware for the task at hand, without requiring hand-optimization or backpropagation training.  ...  ADAM continuously interacts with the environment and is optimized for efficiently running the irregular neural networks generated by EvE.  ...  The trained model hence obtained is then deployed in the cloud or at the edge over inference accelerators (such as GPUs, FPGAs, or ASICs).  ... 
arXiv:1808.01363v2 fatcat:fqizvyhwyzeqpdtwkhg4teikgq

Resource and data optimization for hardware implementation of deep neural networks targeting FPGA-based edge devices

Xinheng Liu, Dae Hee Kim, Chang Wu, Deming Chen
2018 Proceedings of the 20th System Level Interconnect Prediction Workshop on - SLIP '18  
Targeting convolutional neural networks (CNNs), we adopt the high level synthesis (HLS) design methodology and explore various optimization and synthesis techniques to optimize design on an FPGA.  ...  Recently, as machine learning algorithms have become more practical, there have been much effort to implement them on devices that can be used in our daily lives.  ...  INTRODUCTION In recent years, we see the booming of deep convolutional neural networks in solving artificial intelligence tasks.  ... 
doi:10.1145/3225209.3225214 dblp:conf/dac/LiuKWC18 fatcat:4l3u7zzk6ndxzjy4pgbk63wjcm

A Residual Network and FPGA Based Real-Time Depth Map Enhancement System

Zhenni Li, Haoyi Sun, Yuliang Gao, Jiao Wang
2021 Entropy  
In this FPGA design, intensity image and depth image are captured by the dual-camera synchronous acquisition system as the input of neural network.  ...  In this paper, we propose a real-time depth map enhancement system based on a residual network which uses dual channels to process depth maps and intensity maps respectively and cancels the preprocessing  ...  [14] proposed a deep neural network structure that implements end-to-end mapping between low-resolution depth maps and high-resolution depth maps and proved that deep neural networks were superior to  ... 
doi:10.3390/e23050546 pmid:33924967 fatcat:yu725ezuibhoxotiz7qnf7gct4

FPGA Architecture for Deep Learning and its application to Planetary Robotics [article]

Pranay Gankidi, Jekan Thangavelautham
2017 arXiv   pre-print
This paper presents a FPGA implementation of Q-learning with Artificial Neural Networks (ANN).  ...  This method matches the massive parallelism inherent in neural network software with the fine-grain parallelism of an FPGA hardware thereby dramatically reducing processing time.  ...  In our work here, we combine deep neural networks with Q-learning and implement it on board an FPGA.  ... 
arXiv:1701.07543v1 fatcat:ai4uujhxg5exjiipssy65giffi

NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps [article]

Alessandro Aimar, Hesham Mostafa, Enrico Calabrese, Antonio Rios-Navarro, Ricardo Tapiador-Morales, Iulia-Alexandra Lungu, Moritz B. Milde, Federico Corradi, Alejandro Linares-Barranco, Shih-Chii Liu, Tobi Delbruck
2018 arXiv   pre-print
Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks.  ...  NullHop can process up to 128 input and 128 output feature maps per layer in a single pass.  ...  Exploiting temporal sparsity can greatly reduce recurrent neural network memory access [37] , [38] .  ... 
arXiv:1706.01406v2 fatcat:epm6g7fgdnesbmni4i4gjsg75a

Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale [article]

Forrest Iandola
2016 arXiv   pre-print
In recent years, the research community has discovered that deep neural networks (DNNs) and convolutional neural networks (CNNs) can yield higher accuracy than all previous solutions to a broad array of  ...  In this dissertation, we develop a methodology that enables systematic exploration of the design space of CNNs. Our methodology is comprised of the following four themes. 1.  ...  A high-accuracy deep neural network (DNN) model such as GoogLeNet [79] can take weeks to train on a modern GPU.  ... 
arXiv:1612.06519v1 fatcat:jwo2gyfjvfh3lbkfdntctx24o4

Wavefront parallelization of recurrent neural networks on multi-core architectures

Robin Kumar Sharma, Marc Casas
2020 Proceedings of the 34th ACM International Conference on Supercomputing  
Recurrent neural networks (RNNs) are widely used for natural language processing, time-series prediction, or text analysis tasks.  ...  We use ne-grained pipeline parallelism in terms of wavefront computations to accelerate multi-layer RNNs running on multi-core CPUs.  ...  INTRODUCTION Neural networks composed of multiple layers are called Deep Neural Networks (DNNs) [38] .  ... 
doi:10.1145/3392717.3392762 fatcat:tabbmlo7hrdvbaugs7om74qx7a

A Competitive Edge: Can FPGAs Beat GPUs at DCNN Inference Acceleration in Resource-Limited Edge Computing Applications? [article]

Ian Colbert, Jake Daly, Ken Kreutz-Delgado, Srinjoy Das
2021 arXiv   pre-print
We propose this FPGA-based accelerator to be used for Deconvolutional Neural Network (DCNN) inference in low-power edge computing applications.  ...  On these networks, our FPGA design achieves a higher throughput to power ratio with lower run-to-run variation when compared to the NVIDIA Jetson TX1 edge computing GPU.  ...  ACKNOWLEDGEMENTS This work was supported in part by NSF awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, the University of California Office of the President, and the California Institute for  ... 
arXiv:2102.00294v2 fatcat:x6gzg7v2anhprauyrghgowlkcm

High performance reconfigurable computing for numerical simulation and deep learning

Lin Gan, Ming Yuan, Jinzhe Yang, Wenlai Zhao, Wayne Luk, Guangwen Yang
2020 CCF Transactions on High Performance Computing  
The above advantages, together with the flexibility of their on-chip resources supporting, for example, multiple numerical precisions, have provided a promising candidate for both current-and next-generation  ...  In all, this report summarises and analyses recent FPGA-related efforts in HPC.  ...  Acknowledgements This work was supported in part by the National  ... 
doi:10.1007/s42514-020-00032-x fatcat:mbnb73zazzgohhe4quhuqlryky

A comprehensive review of Binary Neural Network [article]

Chunyu Yuan, Sos S. Agaian
2021 arXiv   pre-print
With BBNs, a significant amount of storage, network complexity and energy consumption can be reduced, and neural networks can be implemented more efficiently in embedded applications.  ...  Binary Neural Network (BNN) method is an extreme application of convolutional neural network (CNN) parameter quantization.  ...  TVM is an open source deep learning compiler framework for diverse hardware environments including CPUs, GPUs, and deep learning accelerators.  ... 
arXiv:2110.06804v2 fatcat:jwr7mimfpvad5pre2qhmguhw4i

RT-RCG: Neural Network and Accelerator Search Towards Effective and Real-time ECG Reconstruction from Intracardiac Electrograms [article]

Yongan Zhang, Anton Banta, Yonggan Fu, Mathews M. John, Allison Post, Mehdi Razavi, Joseph Cavallaro, Behnaam Aazhang, Yingyan Lin
2021 arXiv   pre-print
search for (1) efficient Deep Neural Network (DNN) structures and then (2)corresponding accelerators, to enable Real-Time and high-quality Reconstruction of ECG signals from EGM signals.  ...  large and discrete accelerator design space to generate optimized accelerators.  ...  PRELIMINARIES OF DEEP NEURAL NETWORKS (DNNS) AND THE EGM/ECG DATA FORMAT Deep Neural Networks (DNNs).  ... 
arXiv:2111.02569v1 fatcat:2onqyiqe45bu7icjsw2mcoasne
« Previous Showing results 1 — 15 out of 169 results