67 Hits in 5.9 sec

An Updated Survey of Efficient Hardware Architectures for Accelerating Deep Convolutional Neural Networks

Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Muhammad Shafique, Guido Masera, Maurizio Martina
2020 Future Internet  
Deep Neural Networks (DNNs) are nowadays a common practice in most of the Artificial Intelligence (AI) applications.  ...  In this paper, the reader will first understand what a hardware accelerator is, and what are its main components, followed by the latest techniques in the field of dataflow, reconfigurability, variable  ...  For example, a GPU for server applications is difficult to compare with an accelerator based on ASIC or FPGA for embedded applications.  ... 
doi:10.3390/fi12070113 fatcat:heyq4l3rkrdc5p55xdbhsh4jxu

Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges [article]

Chao Wang, Wenqi Lou, Lei Gong, Lihui Jin, Luchao Tan, Yahui Hu, Xi Li, Xuehai Zhou
2017 arXiv   pre-print
At present, the implementation of heterogeneous accelerators mainly relies on heterogeneous computing units such as Application-specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU), and Field  ...  Nowadays, in top-tier conferences of computer architecture, emerging a batch of accelerating works based on FPGA or other reconfigurable architectures.  ...  Reference [48] proposes an efficient implementation of the large-scale recurrent neural network on GPU and proves the scalability of this implementation on the GPU.  ... 
arXiv:1712.04771v1 fatcat:3lxv45qb4zaqpagtn3eghrmroe

Recent Trends and Improvisations in FPGA

M. Joy Daniel, K. Siva Kumar M.E.
2017 IOSR Journal of Electrical and Electronics Engineering  
The developers promote FPGA in photonics network and new architectures to provide speed and user defined specifications. FPGA and ASIC plays an important role in the advancements in IoT.  ...  In this paper, the various and recent trends and improvements of FPGA are produced.  ...  They have the potential for higher performance and compared with ASIC which offer lower non-recurrent engineering costs, reduced development time, easier debugging and reduced risk.  ... 
doi:10.9790/1676-1203027177 fatcat:isael7icofgcxosiszjm5k6dv4

FPGA Acceleration of Recurrent Neural Network Based Language Model

Sicheng Li, Chunpeng Wu, Hai Li, Boxun Li, Yu Wang, Qinru Qiu
2015 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines  
However, the use of RNNLM has been greatly hindered for the high computation cost in training. This work presents an FPGA implementation framework for RNNLM training acceleration.  ...  Recurrent neural network (RNN) based language model (RNNLM) is a biologically inspired model for natural language processing.  ...  The hardware acceleration, thus, is necessary and implementations in ASICs [14] , GPUs [15] and FPGAs [16] have been explored.  ... 
doi:10.1109/fccm.2015.50 dblp:conf/fccm/LiWLLWQ15 fatcat:dk66yqbdfvc2niu2acs3rwfn3q

Embedded Intelligence on FPGA: Survey, Applications and Challenges

Kah Phooi Seng, Paik Jen Lee, Li Minn Ang
2021 Electronics  
There are four main classification and thematic descriptors which are reviewed and discussed in this paper for EI: (1) EI techniques including machine learning and neural networks, deep learning, expert  ...  This paper presents an overview and review of embedded intelligence on FPGA with a focus on applications, platforms and challenges.  ...  In particular, deep neural networks (DNNs) such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been shown to be highly successful for diverse applications in healthcare  ... 
doi:10.3390/electronics10080895 fatcat:igqk3n2kp5f4bmt6ho2qa3baau

Deep Neural Network Approximation for Custom Hardware: Where We've Been, Where We're Going [article]

Erwei Wang, James J. Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter Y. K. Cheung, George A. Constantinides
2019 arXiv   pre-print
This article represents the first survey providing detailed comparisons of custom hardware accelerators featuring approximation for both convolutional and recurrent neural networks, through which we hope  ...  Research has shown that custom hardware-based neural network accelerators can surpass their general-purpose processor equivalents in terms of both throughput and energy efficiency.  ...  More speci cally, we make the following novel contributions: • We motivate DNN approximation for custom hardware by comparing the so-called roo ine models [107] of comparable FPGA, ASIC, CPU and GPU  ... 
arXiv:1901.06955v3 fatcat:rkgo2oisdrgv3dtnbtlldlkpba

GPU-Based Embedded Intelligence Architectures and Applications

Li Minn Ang, Kah Phooi Seng
2021 Electronics  
overview and classifications of GPU-based EI research are presented to give the full spectrum in this area that also serves as a concise summary of the scope of the paper; (2) Second, various architecture  ...  technologies for GPU-based deep learning techniques and applications are discussed in detail; and (3) Third, various architecture technologies for machine learning techniques and applications are discussed  ...  The authors in [19] proposed an approach for training RNNs (recurrent neural networks) on multiple GPUs.  ... 
doi:10.3390/electronics10080952 fatcat:paubm2sevbhixi2in63ayflmti

Horizontal Review on Video Surveillance for Smart Cities: Edge Devices, Applications, Datasets, and Future Trends

Mostafa Ahmed Ezzat, Mohamed A. Abd El Ghany, Sultan Almotairi, Mohammed A.-M. Salem
2021 Sensors  
Namely, the application of video surveillance in smart cities, algorithms, datasets, and embedded systems.  ...  The automation strategy of today's smart cities relies on large IoT (internet of Things) systems that collect big data analytics to gain insights.  ...  , ASIC, CPU and GPU different hardware comparison with different algorithms.  ... 
doi:10.3390/s21093222 pmid:34066509 fatcat:27lploodmvdl3k36x4jzw2acly

Machine Learning at the Network Edge: A Survey [article]

M.G. Sarwar Murshed, Christopher Murphy, Daqing Hou, Nazar Khan, Ganesh Ananthanarayanan, Faraz Hussain
2021 arXiv   pre-print
, frameworks, and hardware used in successful applications of intelligent edge systems.  ...  Resource-constrained IoT devices, such as sensors and actuators, have become ubiquitous in recent years.  ...  It supports DL without the help of a cloud server and has some neural network APIs to support hardware acceleration 13 .  ... 
arXiv:1908.00080v4 fatcat:mw4lwwvzf5gupjr6pgdgnabeuu

An FPGA Accelerated Method for Training Feed-forward Neural Networks Using Alternating Direction Method of Multipliers and LSMR [article]

Seyedeh Niusha Alavi Foumani, Ce Guo, Wayne Luk
2020 arXiv   pre-print
In this project, we have successfully designed, implemented, deployed and tested a novel FPGA accelerated algorithm for neural network training.  ...  As an intermediate stage, we fully implemented the ADMM-LSMR method in C language for feed-forward neural networks with a flexible number of layers and hidden size.  ...  In Recurrent Neural Networks, this becomes even more crucial [10] .  ... 
arXiv:2009.02784v1 fatcat:erbcecjiira6bkjdsjeaw3ksem

Applications and Techniques for Fast Machine Learning in Science [article]

Allison McCarn Deiana, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini (+74 others)
2021 arXiv   pre-print
In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing  ...  This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions.  ...  hardware, e.g., CPU, GPU, ASIC, and FPGA.  ... 
arXiv:2110.13041v1 fatcat:cvbo2hmfgfcuxi7abezypw2qrm

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions [article]

Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis
2018 arXiv   pre-print
To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs.  ...  In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks.  ...  Powernormalised throughput and latency should also be reported for comparison with other parallel architectures such as CPUs, GPUs and DSPs.  ... 
arXiv:1803.05900v1 fatcat:3gkwtxuahrghhmhz4nmkpqe7we

Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead

Maurizio Capra, Beatrice Bussolino, Alberto Marchisio, Guido Masera, Maurizio Martina, Muhammad Shafique
2020 IEEE Access  
This work summarizes and compares the works for four leading platforms for the execution of algorithms such as CPU, GPU, FPGA and ASIC describing the main solutions of the state-of-the-art, giving much  ...  This paper first introduces the key properties of two brain-inspired models like Deep Neural Network (DNN), and Spiking Neural Network (SNN), and then analyzes techniques to produce efficient and high-performance  ...  Conversely to GPUs, FPGA and ASIC accelerators have a limited amount of memory.  ... 
doi:10.1109/access.2020.3039858 fatcat:nticzqgrznftrcji4krhyjxudu

An Efficient Hardware Design for Accelerating Sparse CNNs with NAS-based Models

Yun Liang, Liqiang Lu, Yicheng Jin, Jiaming Xie, Ruirui Huang, Jiansong Zhang, Wei Lin
2021 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
Deep convolutional neural networks (CNNs) have achieved remarkable performance at the cost of huge computation.  ...  As the CNN models become more complex and deeper, compressing CNNs to sparse by pruning the redundant connection in the networks has emerged as an attractive approach to reduce the amount of computation  ...  ACKNOWLEDGEMENT This work was supported in part by the Beijing Natural Science Foundation (No. JQ19014) and in part by the Beijing Academy of Artificial Intelligence (BAAI).  ... 
doi:10.1109/tcad.2021.3066563 fatcat:vxqd4ez64zgxxcwuy5uq2txmpy

ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network [article]

David Gschwend
2020 arXiv   pre-print
Convolutional Neural Networks (CNNs) presently achieve record-breaking accuracies in all image understanding benchmarks, but have a very high computational complexity.  ...  It accelerates the full network based on a nested-loop algorithm which minimizes the number of arithmetic operations and memory accesses.  ...  Acknowledgement First and foremost, I would like to thank my supervisor Emanuel Schmid for the pleasant  ... 
arXiv:2005.06892v1 fatcat:tduahjb5w5cjromemahngmt3gy
« Previous Showing results 1 — 15 out of 67 results