183 Hits in 4.4 sec

A Design Methodology for Efficient Implementation of Deconvolutional Neural Networks on an FPGA [article]

Xinyu Zhang, Srinjoy Das, Ojash Neopane, Ken Kreutz-Delgado
2017 arXiv   pre-print
However, to date, there has been little research on the use of FPGA implementations of deconvolutional neural networks (DCNNs).  ...  map these DCNNs to an FPGA DCNN-plus-accelerator implementation to perform generative inference on a Xilinx Zynq-7000 FPGA.  ...  Section III presents our methodology for efficiently implementing an FPGA-based deconvolution accelerator. Section IV explains our three-step design methodology.  ... 
arXiv:1705.02583v1 fatcat:qpscnbb5lzfmxfisbmautgy6om

An Energy-Efficient FPGA-based Deconvolutional Neural Networks Accelerator for Single Image Super-Resolution [article]

Jung-Woo Chang, Keon-Woo Kang, Suk-Ju Kang
2018 arXiv   pre-print
First, we propose a new methodology for optimizing the deconvolutional neural networks (DCNNs) used for increasing feature maps.  ...  Finally, we quantize and compress a DCNN-based SR algorithm into an optimal model for efficient inference using on-chip memory.  ...  Deconvolutional Neural Networks III. RELATED WORK A.  ... 
arXiv:1801.05997v3 fatcat:mbhed5c2lncqxitgmz5uhnivwy

A Competitive Edge: Can FPGAs Beat GPUs at DCNN Inference Acceleration in Resource-Limited Edge Computing Applications? [article]

Ian Colbert, Jake Daly, Ken Kreutz-Delgado, Srinjoy Das
2021 arXiv   pre-print
As such, we design a spatio-temporally parallelized hardware architecture capable of accelerating a deconvolution algorithm optimized for power-efficient inference on a resource-limited FPGA.  ...  We propose this FPGA-based accelerator to be used for Deconvolutional Neural Network (DCNN) inference in low-power edge computing applications.  ...  ACKNOWLEDGEMENTS This work was supported in part by NSF awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, the University of California Office of the President, and the California Institute for  ... 
arXiv:2102.00294v2 fatcat:x6gzg7v2anhprauyrghgowlkcm

Binarized Encoder-Decoder Network and Binarized Deconvolution Engine for Semantic Segmentation

Hyunwoo Kim, Jeonghoon Kim, Jungwook Choi, Jungkeol Lee, Yong Ho Song.
2020 IEEE Access  
BEDN has a network size of 0.21 MB, and its maximum memory usage is 1.38 MB. BiDE was implemented on Xilinx ZU7EV field-programmable gate array (FPGA) to operate at 187.5 MHz.  ...  For this reason, the segmentation network requires a lot of hardware resources and power consumption, and it is difficult to be applied to an environment where they are limited, such as an embedded system  ...  ACKNOWLEDGMENT Hyunwoo Kim was in charge of the development of hardware accelerator (BiDE) and Jeong Hoon Kim was in charge of network design and training (BEDN ).  ... 
doi:10.1109/access.2020.3048375 fatcat:kcy6zjhtzzgslmhth367gtephm

2020 Index IEEE Transactions on Very Large Scale Integration (VLSI) Systems Vol. 28

2020 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
Reconfigurable Power-Efficient Ternary Content-Addressable Memory on FPGAs; TVLSI Aug. 2020 Aug. 1925Aug  ...  ., Conflux-An Asynchronous Two-to-One Multiplexor for Time-Division Multiplexing and Clockless, Tokenless Readout; TVLSI Feb. 2020 503-515 Holcomb, D., see 2685-2698 Holcomb, D.E., see 1807-1820 Homayoun  ...  ., +, TVLSI June 2020 1540-1544 An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs.  ... 
doi:10.1109/tvlsi.2020.3041879 fatcat:33vb2eia2jfjpog4wei4peq5ge

Efficient Object Detection Framework and Hardware Architecture for Remote Sensing Images

Li, Zhang, Wu
2019 Remote Sensing  
Moreover, for evaluating the performance of proposed hardware architecture, we implement it on Xilinx XC7Z100 field programmable gate array (FPGA) and test on the proposed CBFFSSD and VGG16 models.  ...  Based on the analysis and optimization of the calculation of each layer in the algorithm, we propose efficient hardware architecture of deep learning processor with multiple neural processing units (NPUs  ...  Conflicts of Interest: The authors declare no conflict of interest. Remote Sens. 2019, 11, 2376  ... 
doi:10.3390/rs11202376 fatcat:6ubha7ol5bcx7mgpn3mnt3zawy

Smart sensors using artificial intelligence for on-detector electronics and ASICs [article]

Gabriella Carini, Grzegorz Deptuch, Jennet Dickinson, Dionisio Doering, Angelo Dragone, Farah Fahim, Philip Harris, Ryan Herbst, Christian Herwig, Jin Huang, Soumyajit Mandal, Cristina Mantilla Suarez (+10 others)
2022 arXiv   pre-print
, and implementations for next generation experiments.  ...  In this paper, we discuss the motivations and potential applications for on-detector AI.  ...  Figure 2 : 2 Figure 2: A typical workflow to translate an ML model into an FPGA or ASIC implementation using hls4ml.  ... 
arXiv:2204.13223v1 fatcat:2lm7k2epcraelasr6n4d6nshzu

A Memristor based Unsupervised Neuromorphic System Towards Fast and Energy-Efficient GAN [article]

F. Liu, C. Liu, F.Bi
2019 arXiv   pre-print
We also proposed an efficient data flow for optimal parallelism training and testing, depending on the computation correlations between different computing blocks.  ...  In this work, we proposed a holistic solution for fast and energy-efficient GAN computation through a memristor-based neuromorphic system.  ...  In [11, 12] , convolutional neural networks (CNNs) were deployed on memristor crossbar and an on-line training process with backpropagation were implemented via a hardware and software co-design methodology  ... 
arXiv:1806.01775v4 fatcat:pckbn7vgvbadbfcb2fbqju3sui

2020 Index IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Vol. 39

2020 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
., +, TCAD Oct. 2020 2668- 2681 DSP-Efficient Hardware Acceleration of Convolutional Neural Network Inference on FPGAs.  ...  ., +, TCAD Oct. 2020 2668- 2681 DSP-Efficient Hardware Acceleration of Convolutional Neural Network Inference on FPGAs.  ...  Entropy-Directed Scheduling for FPGA High-Level Synthesis. Shen, M., +, TCAD Oct. 2020 2588 -2601 FLASH: Fast, Parallel, and Accurate Simulator for HLS.  ... 
doi:10.1109/tcad.2021.3054536 fatcat:wsw3olpxzbeclenhex3f73qlw4

Muon–Electron Pulse Shape Discrimination for Water Cherenkov Detectors Based on FPGA/SoC

Luis Guillermo Garcia, Romina Soledad Molina, Maria Liz Crespo, Sergio Carrato, Giovanni Ramponi, Andres Cicuttin, Ivan Rene Morales, Hector Perez
2021 Electronics  
One uses an artificial neural network (ANN) algorithm; the other exploits a correlation approach based on finite impulse response (FIR) filters.  ...  We describe two methods for pulse shape detection and discrimination of muons and electrons implemented on FPGA.  ...  [30] proposed a framework for deconvolutional neural networks (DCNN) hardware accelerators, focusing on the efficient utilization of the on-chip memory to improve the performance of the layers bounded  ... 
doi:10.3390/electronics10030224 fatcat:uljnj6urtbaynflwcbrf43xcy4

An Overview of Efficient Interconnection Networks for Deep Neural Network Accelerators

Seyed Morteza Nabavinejad, Mohammad Baharloo, Kun-Chih Chen, Maurizio Palesi, Tim Kogel, Masoumeh Ebrahimi
2020 IEEE Journal on Emerging and Selected Topics in Circuits and Systems  
This paper provides a comprehensive investigation of the recent advances in efficient on-chip interconnection and design methodology of the DNN accelerator design.  ...  Currently, a large body of research aims to find an efficient on-chip interconnection to achieve low-power and high-bandwidth DNN computing.  ...  Implementing deconvolution on current ReRAM-based NN Accelerators, which are optimized for convolution, can significantly degrade performance and energy efficiency.  ... 
doi:10.1109/jetcas.2020.3022920 fatcat:idqitgwnrnegbd4dhrly3xsxbi

Towards Design Methodology of Efficient Fast Algorithms for Accelerating Generative Adversarial Networks on FPGAs [article]

Jung-Woo Chang, Saehyun Ahn, Keon-Woo Kang, Suk-Ju Kang
2019 arXiv   pre-print
Finally, we propose an efficient architecture for implementing Winograd DeConv by designing the line buffer and exploring the design space.  ...  In this paper, we propose an efficient Winograd DeConv accelerator that combines these two orthogonal approaches on FPGAs.  ...  Finally, we designed an efficient architecture by designing the line buffer and exploring the design space for the efficient implementation.  ... 
arXiv:1911.06918v1 fatcat:g2gpdp2ikjgjtbyjr334fyrsla

FPGA-Embedded Anomaly Detection System for Milling Process

Tomasz Zabinski, Zbigniew Hajduk, Jacek Kluska, Leslaw Gniewek
2021 IEEE Access  
The main goal of this work is to design a supervising controller able to detect an anomaly in the milling process and implement the soultion in Field Programmable Gate Array (FPGA) chip.  ...  The detection method relies on determining selected signal features in the frequency domain and applying an auto-associative neural network (AANN) for novelty detection.  ...  A methodology for real-time stator condition monitoring of an induction motor using a fuzzy system implemented on FPGA was presented in [18] .  ... 
doi:10.1109/access.2021.3110479 fatcat:uvgc3gzxljdrhaw6wp42fqlzg4

ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network [article]

David Gschwend
2020 arXiv   pre-print
The ZynqNet Embedded CNN is designed for image classification on ImageNet and consists of ZynqNet CNN, an optimized and customized CNN topology, and the ZynqNet FPGA Accelerator, an FPGA-based architecture  ...  This master thesis explores the potential of FPGA-based CNN acceleration and demonstrates a fully functional proof-of-concept CNN implementation on a Zynq System-on-Chip.  ...  Acknowledgement First and foremost, I would like to thank my supervisor Emanuel Schmid for the pleasant  ... 
arXiv:2005.06892v1 fatcat:tduahjb5w5cjromemahngmt3gy

FPGA Implementation of a Novel Gaussian Filter Using Power Optimized Approximate Adders

Jamshid M Basheer, Murugesh V
2018 Indonesian Journal of Electrical Engineering and Computer Science  
This paper discusses the implementation of a novel Gaussian smoothing filter with low power approximate adders in Field Programmable Gate Array (FPGA).  ...  Hence the hardware implementation of the Gaussian filter becomes a reliable solution for real time image processing applications.  ...  Efficient modified guided filter architecture is implemented on Xilinx FPGA device which offers less cost, speed and low power.  ... 
doi:10.11591/ijeecs.v11.i3.pp1048-1059 fatcat:hcnrb7gearhync5dwegjxezdxi
« Previous Showing results 1 — 15 out of 183 results