Filters








6 Hits in 8.9 sec

A Competitive Edge: Can FPGAs Beat GPUs at DCNN Inference Acceleration in Resource-Limited Edge Computing Applications? [article]

Ian Colbert, Jake Daly, Ken Kreutz-Delgado, Srinjoy Das
2021 arXiv   pre-print
We propose this FPGA-based accelerator to be used for Deconvolutional Neural Network (DCNN) inference in low-power edge computing applications.  ...  As such, we design a spatio-temporally parallelized hardware architecture capable of accelerating a deconvolution algorithm optimized for power-efficient inference on a resource-limited FPGA.  ...  We would also like to thank Parimal Patel and Stephen Neuendorffer at Xilinx and Byungheon Jeon at UC San Diego.  ... 
arXiv:2102.00294v2 fatcat:x6gzg7v2anhprauyrghgowlkcm

Review of deep learning: concepts, CNN architectures, challenges, applications, future directions

Laith Alzubaidi, Jinglan Zhang, Amjad J. Humaidi, Ayad Al-Dujaili, Ye Duan, Omran Al-Shamma, J. Santamaría, Mohammed A. Fadhel, Muthana Al-Amidie, Laith Farhan
2021 Journal of Big Data  
It is followed by a list of the major DL applications. Computational tools including FPGA, GPU, and CPU are summarized along with a description of their influence on DL.  ...  Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching or even beating those provided  ...  and can be customized FPGA DCNN training Greater float-point capabilities provided by GPU GPU Feature Assessment  ... 
doi:10.1186/s40537-021-00444-8 pmid:33816053 pmcid:PMC8010506 fatcat:x2h5qs5c2jbntipu7oi5hfnb6u

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks [article]

Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W. Keckler, and William J. Dally
2017 arXiv   pre-print
Our results show that on contemporary neural networks, SCNN can improve both performance and energy by a factor of 2.7x and 2.3x, respectively, over a comparably provisioned dense CNN accelerator.  ...  In addition, the accumulation of multiplication products are performed in a novel accumulator array.  ...  Today, training is often done on GPUs [24] or farms of GPUs, while inference depends on the application and can employ CPUs, GPUs, FPGA, or specially-built ASICs.  ... 
arXiv:1708.04485v1 fatcat:pt53mgyw5zh35ct3q35iopn3ba

Exploring the Design Space of Deep Convolutional Neural Networks at Large Scale [article]

Forrest Iandola
2016 arXiv   pre-print
In recent years, the research community has discovered that deep neural networks (DNNs) and convolutional neural networks (CNNs) can yield higher accuracy than all previous solutions to a broad array of  ...  Instead, the "right" CNN/DNN architecture varies depending on the application at hand. CNN/DNNs comprise an enormous design space.  ...  Quantity of Communication A computational resource (e.g. one mobile phone or a cluster of servers) has a limited quantity of computation that it can perform each second.  ... 
arXiv:1612.06519v1 fatcat:jwo2gyfjvfh3lbkfdntctx24o4

FATNN: Fast and Accurate Ternary Neural Networks [article]

Peng Chen, Bohan Zhuang, Chunhua Shen
2021 arXiv   pre-print
To tackle these two challenges, in this work, we first show that, under some mild constraints, computational complexity of the ternary inner product can be reduced by a factor of 2.  ...  Moreover, there is still a significant gap in accuracy between TNNs and full-precision networks, hampering their deployment to real applications.  ...  However, a significant obstacle for deploying DCNN algorithms to mobile/embedded edge devices with limited computing resources is the ever growing computation complexity-in order to achieve good accuracy  ... 
arXiv:2008.05101v4 fatcat:ojcqdwbtkfeaja4iwcslyzwd7a

Exploration of Energy Efficient Hardware and Algorithms for Deep Learning

Syed Sarwar
2019
Deep Neural Networks (DNNs) have emerged as the state-of-the-art technique in a wide range of machine learning tasks for analytics and computer vision in the next generation of embedded (mobile, IoT, wearable  ...  Despite their success, they suffer from high energy requirements both in inference and training.  ...  The DCNNs were trained, tested and timed using NVIDIA GPUs.  ... 
doi:10.25394/pgs.8044442.v1 fatcat:dxvbl6ofarasta4gtpfzn535hm