155 Hits in 6.6 sec

A Heterogeneous Architecture for the Vision Processing Unit with a Hybrid Deep Neural Network Accelerator

Peng Liu, Zikai Yang, Lin Kang, Jian Wang
2022 Micromachines  
In this paper, we propose a heterogeneous architecture for the VPU with a hybrid accelerator for the DNNs. It can process the ISP, CNNs, and hybrid DNN subtasks on one unit.  ...  Meanwhile, only the CNNs and the CNN-RNN frameworks are used in the vision tasks, and few DNPUs are specifically designed for this.  ...  The Experiment Results for the DNNs and the Analysis Two types of DNNs were tested on the VPU for the vision application, including CNNs and the hybrid DNNs.  ... 
doi:10.3390/mi13020268 pmid:35208392 pmcid:PMC8878321 fatcat:vtsvmgpbtfbyzemrdfhixl4rke

Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA

Shuanglong Liu, Hongxiang Fan, Xinyu Niu, Ho-cheung Ng, Yang Chu, Wayne LUK
2018 ACM Transactions on Reconfigurable Technology and Systems  
A real-time application of scene segmentation on Cityscapes Dataset is used to evaluate our CNN accelerator on Zynq ZC706 board, and the system achieves a performance of 107 GOPS and 0.12 GOPS/DSP using  ...  Furthermore, a hardware mapping framework is developed to automatically generate the low-latency hardware design for any given CNN model on the target device.  ...  ] for computer vision applications such as scene segmentation [1] , image denoising [28] and super-resolution imaging.  ... 
doi:10.1145/3242900 fatcat:4tollsot6vdgvfnerhhftuirnu

An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution

Bing Liu, Danyin Zou, Lei Feng, Shou Feng, Ping Fu, Junbao Li
2019 Electronics  
cache.  ...  The accelerator can handle network layers of different scales through parameter configuration and maximizes bandwidth and achieves full pipelined by using a data stream interface and ping-pong on-chip  ...  Acknowledgments: The authors would like to thank the Editor and the anonymous reviewers for their valuable comments and suggestions.  ... 
doi:10.3390/electronics8030281 fatcat:mx4esrhr7zhmpfjd6gtbdsc3x4

PipeCNN: An OpenCL-Based FPGA Accelerator for Large-Scale Convolution Neuron Networks [article]

Dong Wang, Jianjing An, Ke Xu
2016 arXiv   pre-print
We have achieved a similar peak performance of 33.9 GOPS with a 34% resource reduction on DSP blocks compared to previous work.  ...  Convolutional neural networks (CNNs) have been widely employed in many applications such as image classification, video analysis and speech recognition.  ...  On the host side, a C/C++ code runs on the CPU, providing vendor specific application programming interface (API) to communicate with the kernels implemented on the FPGA accelerator.  ... 
arXiv:1611.02450v1 fatcat:jxuuikzsr5agrhsalo7xgzg354

Systolic-CNN: An OpenCL-defined Scalable Run-time-flexible FPGA Accelerator Architecture for Accelerating Convolutional Neural Network Inference in Cloud/Edge Computing [article]

Akshay Dua, Yixing Li, Fengbo Ren
2020 arXiv   pre-print
Systolic-CNN is highly scalable and parameterized, which can be easily adapted by users to achieve up to 100% utilization of the coarse-grained computation resources (i.e., DSP blocks) for a given FPGA  ...  Systolic-CNN adopts a highly pipelined and paralleled 1-D systolic array architecture, which efficiently explores both spatial and temporal parallelism for accelerating CNN inference on FPGAs.  ...  for computer vision tasks [23] [24] [25] .  ... 
arXiv:2012.03177v1 fatcat:h5alzshjybhv7kmpmeb46an3qm

Kalray MPPA®: Massively parallel processor array: Revisiting DSP acceleration with the Kalray MPPA Manycore processor

Benoit Dupont de Dinechin
2015 2015 IEEE Hot Chips 27 Symposium (HCS)  
compute clusters SMP Linux on I/O clusters Running on a Kalray quad-core, the other quad-core manages the clusters Device drivers for flash, I2C & SPI (sensors, small peripherals), GPIO OpenMP,  ...  ; manufacturing equipment; Provides POSIX threads, timers and run-time support for GCC OpenMP The RM manages the NoC interfaces and supports security functions The PEs execute application code on top of  ... 
doi:10.1109/hotchips.2015.7477332 dblp:conf/hotchips/Dinechin15 fatcat:4bdjnmvj2jfszo7gqao26upe5q

An OpenCL(TM) Deep Learning Accelerator on Arria 10 [article]

Utku Aydonat, Shane O'Connell, Davor Capalija, Andrew C. Ling, Gordon R. Chiu
2017 arXiv   pre-print
As a result, when running our DLA on Intel's Arria 10 device we can achieve a performance of 1020 img/s, or 23 img/s/W when running the AlexNet CNN benchmark.  ...  Convolutional neural nets (CNNs) have become a practical means to perform vision tasks, particularly in the area of image classification.  ...  ACKNOWLEDGEMENTS We would like to thank Stephen Weston for his insightful comments and Kevin Jin for the experimental data.  ... 
arXiv:1701.03534v1 fatcat:fivnwoibxjbphdg3veuutfdjoy

Sparse, Quantized, Full Frame CNN for Low Power Embedded Devices

Manu Mathew, Kumar Desappan, Pramod Kumar Swami, Soyeb Nagori
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
Results of implementation on Texas Instruments TDA2x SoC [17] are presented. We have modified Caffe CNN framework to do the sparse, quantized training described in this paper.  ...  The source code for the training is made available at Conv,Relu 32 5,  ...  It is a low power SoC that operates in single digit Watts of power. It has 4 Embedded Vision Engines (EVEs), which are coprocessors suited for computer vision applications.  ... 
doi:10.1109/cvprw.2017.46 dblp:conf/cvpr/MathewDSN17 fatcat:jzf3vi2eencfrfj4suc6oishbq


Loc Nguyen Huynh, Rajesh Krishna Balan, Youngki Lee
2016 Proceedings of the 2016 Workshop on Wearable Systems and Applications - WearSys '16  
Recently, a branch of machine learning algorithms called deep learning gained huge attention to boost up accuracy of a variety of sensing applications.  ...  Our results show that DeepSense is able to execute a variety of CNN models for image recognition, object detection and face recognition in soft real time with no or marginal accuracy tradeoffs.  ...  In addition, the authors showed that it is feasible to run entire DNN for audio sensing applications on low-power mobile DSPs [6] .  ... 
doi:10.1145/2935643.2935650 dblp:conf/mobisys/LocBL16a fatcat:n2tfswvb6bf2tntsmahzlgqvoi

ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network [article]

David Gschwend
2020 arXiv   pre-print
This master thesis explores the potential of FPGA-based CNN acceleration and demonstrates a fully functional proof-of-concept CNN implementation on a Zynq System-on-Chip.  ...  Many applications demand for embedded solutions that integrate into existing systems with tight real-time and power constraints.  ...  Acknowledgement First and foremost, I would like to thank my supervisor Emanuel Schmid for the pleasant  ... 
arXiv:2005.06892v1 fatcat:tduahjb5w5cjromemahngmt3gy

State of Art IoT and Edge Embedded Systems for Real-Time Machine Vision Applications

Mahmoud Meribout, Asma Baobaid, Mohammed Ould Khaoua, Varun Kumar Tiwari, Juan Pablo Pena
2022 IEEE Access  
It can be a good reference for researchers involved in designing state of the art IoT embedded systems for machine vision applications.  ...  IoT and edge devices dedicated to run machine vision algorithms are usually few years lagging currently available state-of-the-art technologies for hardware accelerators.  ...  Similarly to DNN, CNNs are too complex to run on IoT/edge devices with limited computing power.  ... 
doi:10.1109/access.2022.3175496 fatcat:u7dp4ov5qjhxximk5xgmuigp2m

Weight Sparseness for a Feature-Map-Split-CNN Toward Low-Cost Embedded FPGAs

Akira JINGUJI, Shimpei SATO, Hiroki NAKAHARA
2021 IEICE transactions on information and systems  
We designed a dedicated architecture of a sparse CNN and a memory buffering scheduling for a split-CNN and implemented this on the PYNQ-Z1 FPGA board with a low-end FPGA.  ...  However, CNN has significant parameters called weights and internal data called feature maps, which pose a challenge for FPGAs for performance and memory capacity.  ...  Acknowledgments This research was supported in part by the Grants in Aid for Scientific Research of JSPS, Industry Academia Collaborative R&D Program Center of Innovation (COI) program, Core Research for  ... 
doi:10.1587/transinf.2021pap0011 fatcat:tvesj44cm5cf7az2ldzfg5jh54

A Heterogeneous Hardware Accelerator for Image Classification in Embedded Systems

Ignacio Pérez, Miguel Figueroa
2021 Sensors  
In this paper, we present a scalable, low power, low resource-utilization accelerator architecture for inference on the MobileNet V2 CNN.  ...  Although previous works have implemented CNN inference on FPGAs, their high utilization of on-chip memory and arithmetic resources complicate their application on resource-constrained edge devices.  ...  cache.  ... 
doi:10.3390/s21082637 pmid:33918668 fatcat:kzaznvcinjebtdvqqx4wbnttvu

DSC: Dense-Sparse Convolution for Vectorized Inference of Convolutional Neural Networks

Alexander Frickenstein, Manoj Rohit Vemparala, Christian Unger, Fatih Ayar, Walter Stechele
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
CPU, are used for low-latency inference of the sparse CNNs. The proposed open-source 1 CPU-kernel is enabled to scale along with the vector word-length and the number of cores.  ...  The efficient applications of Convolutional Neural Networks (CNNs) in automotive-rated and safety critical hardware-accelerators require an interplay of DNN design optimization, programming techniques  ...  In case of Nvidia-GPUs and FPGAs, the partitioning of CUDA cores and DSP blocks, must be performed for the sparse and dense operations, respectively.  ... 
doi:10.1109/cvprw.2019.00175 dblp:conf/cvpr/FrickensteinVUA19 fatcat:t7bjpmiyhvh6rimfoosgi7somu

Boosting Mobile CNN Inference through Semantic Memory [article]

Yun Li, Chen Zhang, Shihao Han, Li Lyna Zhang, Baoqun Yin, Yunxin Liu, Mengwei Xu
2021 arXiv   pre-print
SMTM is prototyped on commodity CNN engine and runs on both mobile CPU and GPU.  ...  For the first time, we borrow and distill such a capability into a semantic memory design, namely SMTM, to improve on-device CNN inference.  ...  On GPU, we implement a zero-copy data probability.  ... 
arXiv:2112.02644v1 fatcat:gfyecsojvzgaxjgmlwihov26ju
« Previous Showing results 1 — 15 out of 155 results