Filters








336 Hits in 6.2 sec

Hardware Acceleration of Computer Vision and Deep Learning Algorithms on the Edge using OpenCL

B. Mishra, D. Chakraborty, S. Makkadayil, S. Patil, B. Nallani
2019 EAI Endorsed Transactions on Cloud Systems  
This work proposes a low-cost, scalable, compute-at-the-edge solution using FPGA and OpenCL.  ...  The paper proposes a methodology that can be used to accelerate traditional as well as machine learning based computer vision algorithms.  ...  We would like to thank our colleagues at Intel Bangalore and Intel Penang who supported this activity  ... 
doi:10.4108/eai.5-11-2019.162597 fatcat:2grwelq3dvft3f7xot47pbdwte

A Unified Optimization Approach for CNN Model Inference on Integrated GPUs

Leyuan Wang, Zhi Chen, Yizhi Liu, Yao Wang, Lianmin Zheng, Mu Li, Yida Wang
2019 Proceedings of the 48th International Conference on Parallel Processing - ICPP 2019  
The first author appreciates the advice of Prof. John D.  ...  ACKNOWLEDGMENT The authors thank the anonymous reviewers of the paper for valuable comments.  ...  commonly used computer vision models on popular edge devices?  ... 
doi:10.1145/3337821.3337839 dblp:conf/icpp/WangCLWZLW19 fatcat:ptvsneujwjdmhesvcrune7rqwy

Energy-efficient FPGA Implementation of the k-Nearest Neighbors Algorithm Using OpenCL

Fahad Muslim, Alexandros Demian, Liang Ma, Luciano Lavagno, Affaq Qamar
2016 Position Papers of the 2016 Federated Conference on Computer Science and Information Systems  
Multiple fairly different implementations of the algorithm are considered and their performance on FPGA and GPU is compared.  ...  Modern SoCs are getting increasingly heterogeneous with a combination of multi-core architectures and hardware accelerators to speed up the execution of computeintensive tasks at considerably lower power  ...  This work is also supported in part by the European Commission through the ECOSCALE project (H2020-ICT-671632).  ... 
doi:10.15439/2016f327 dblp:conf/fedcsis/MuslimDMLQ16 fatcat:c7gspjezb5ek3dx2hmudvkedkm

An FPGA-based architecture for embedded systems performance acceleration applied to Optimum-Path Forest classifier

Wendell F.S. Diniz, Vincent Fremont, Isabelle Fantoni, Eurípedes G.O. Nóbrega
2017 Microprocessors and microsystems  
Acknowledgment This work is supported by the PDSE program of Coordination for Improvement of High Education Personnel (CAPES) of Brazilian Ministry of Education, process 680 nº13077/2013-09 and carried  ...  out in the framework of the Labex MS2T, funded by the French Government through the program "Investments for the future" managed by the National Agency for Research (Reference ANR-11-IDEX-0004-02).  ...  Its implementation of a computer vision application algorithm in a SoC/FPGA board using the OpenCL language and workflow is also presented.  ... 
doi:10.1016/j.micpro.2017.06.013 fatcat:sp3p5ai2ovdv7awrslnbswe2te

Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs [article]

R. Tapiador, A. Rios-Navarro, A. Linares-Barranco, Minkyu Kim, Deepak Kadetotad, Jae-sun Seo
2016 arXiv   pre-print
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia.  ...  In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented.  ...  ACKNOWLEDGMENT This work has been partially supported by Samsung Advance Institute of Technology, and by Xilinx and Altera University Programs, through platform donations.  ... 
arXiv:1609.09296v1 fatcat:fgowcrakozdmxoaq4eoutnwlvy

Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications

Mostafa Rahimiazghadi, Corey Lammie, Jason Kamranr Eshraghian, Melika Payvand, Elisa Donati, Bernabe Linares-Barranco, Giacomo Indiveri
2020 IEEE Transactions on Biomedical Circuits and Systems  
With the advent of dedicated Deep Learning (DL) accelerators and neuromorphic processors, new opportunities are emerging for applying deep and Spiking Neural Network (SNN) algorithms to healthcare and  ...  adoption of these tools, as we shed light on the future of deep networks and spiking neuromorphic processing systems.  ...  large ADC power consumption for computer vision tasks which rely on deep networks and millions of parameters, such as VGG-16.  ... 
doi:10.1109/tbcas.2020.3036081 pmid:33156792 fatcat:rjwfjd7vmvglpk762mqeyiteqq

FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10 [article]

Ke He, Bo Liu, Yu Zhang, Andrew Ling, Dian Gu
2019 arXiv   pre-print
FPGA-enabled Caffe, a hierarchical software and hardware design methodology based on the Caffe to enable FPGA to support mainline deep learning development features, e.g. training and inference with Caffe  ...  Currently, there are many popular frameworks in the market for deep learning development, such as Caffe, TensorFlow, Pytorch, and most of frameworks natively support CPU and consider GPU as the mainline  ...  All of these actions are required by using the framework so that deep learning algorithm developers can focus on algorithm development only with ease.  ... 
arXiv:1911.08905v1 fatcat:k727mudp3neutbj7nnwbzvfk6a

Accelerating Deep Neural Networks implementation: A survey

Meriam Dhouibi, Ahmed Karim Ben Salem, Afef Saidi, Slim Ben Saoud
2021 IET Computers & Digital Techniques  
Deploying such Deep Neural Networks (DNN) on embedded devices is still a challenging task considering the massive requirement of computation and storage.  ...  Recently, Deep Learning (DL) applications are getting more and more involved in different fields.  ...  Additionally, we exposed the tools that can automatically generate hardware design from software that are used for implementing and evaluating deep learning approaches.  ... 
doi:10.1049/cdt2.12016 fatcat:3kl4j5ztl5eahmgv7vetu2egay

MNN: A Universal and Efficient Inference Engine [article]

Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lv, Zhihua Wu
2020 arXiv   pre-print
Deploying deep learning models on mobile devices draws more and more attention recently.  ...  However, designing an efficient inference engine on devices is under the great challenges of model compatibility, device diversity, and resource limitation.  ...  ACKNOWLEDGEMENTS We thank Chaoyue Niu for helpful discussions and the anonymous reviewers for their valuable comments to improve our work.  ... 
arXiv:2002.12418v1 fatcat:ppeykiv57nc6bfqa74lyzse3by

Guest Editorial: Special Issue on Embedded Computer Vision

Stefano Mattoccia, Branislav Kisačanin, Margrit Gelautz, Sek Chai, Ahmed Nabil Belbachir, Goksel Dedeoglu, Fridtjof Stein
2018 Journal of Signal Processing Systems  
We present papers describing a range of novel solutions: a deep learning accelerator, a robust aerial tracking system, an FPGA-based aerial visual servoing task solution, an approach to use low-cost hardware  ...  We are pleased to include six state-of-the-art papers from the leaders in this field, both from industry and academia, who keep pushing the embedded computer vision technology forward.  ...  In their paper "Efficient Object Detection Using Georgia Tech) discuss their research on energy efficient deep learning accelerators and their application on vision-based object detection.  ... 
doi:10.1007/s11265-018-1365-8 fatcat:mbpznn6mzfeuvhphuznn4kb6wy

Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications [article]

Mostafa Rahimi Azghadi, Corey Lammie, Jason K. Eshraghian, Melika Payvand, Elisa Donati, Bernabe Linares-Barranco, Giacomo Indiveri
2020 arXiv   pre-print
With the advent of dedicated Deep Learning (DL) accelerators and neuromorphic processors, new opportunities are emerging for applying deep and Spiking Neural Network (SNN) algorithms to healthcare and  ...  adoption of these tools, as we shed light on the future of deep networks and spiking neuromorphic processing systems as proponents for driving biomedical circuits and systems forward.  ...  large ADC power consumption for computer vision tasks which rely on deep networks and millions of parameters, such as VGG-16.  ... 
arXiv:2007.05657v1 fatcat:amqutl3suvgq5nygna4ef36usy

Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

Jack Turner, Jose Cano, Valentin Radu, Elliot J. Crowley, Michael OrBoyle, Amos Storkey
2018 2018 IEEE International Symposium on Workload Characterization (IISWC)  
In this paper we unify the two viewpoints in a Deep Learning Inference Stack and take an across-stack approach by implementing and evaluating the most common neural network compression techniques (weight  ...  pruning, channel pruning, and quantisation) and optimising their parallel execution with a range of programming approaches (OpenMP, OpenCL) and hardware architectures (CPU, GPU).  ...  The opinions expressed and arguments employed herein do not necessarily reflect the official views of these funding bodies.  ... 
doi:10.1109/iiswc.2018.8573503 dblp:conf/iiswc/TurnerCRCOS18 fatcat:hxxhuovm6fhyhheg55vtwyvsoi

EDSSA: An Encoder-Decoder Semantic Segmentation Networks Accelerator on OpenCL-Based FPGA Platform

Hongzhi Huang, Yakun Wu, Mengqi Yu, Xuesong Shi, Fei Qiao, Li Luo, Qi Wei, Xinjun Liu
2020 Sensors  
We introduce the related technologies, architecture design, algorithm optimization, and hardware implementation of the Encoder-Decoder semantic segmentation network SegNet as an example, and undertake  ...  the FPGA platforms that support Open Computing Language (OpenCL) development.  ...  Conflicts of Interest: The authors declare no conflict of interest. Sensors 2020, 20, 3969  ... 
doi:10.3390/s20143969 pmid:32708851 fatcat:bfzh6ou5djdf5j46734bnbdjgy

Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks [article]

Jack Turner, José Cano, Valentin Radu, Elliot J. Crowley, Michael O'Boyle, Amos Storkey
2018 arXiv   pre-print
In this paper we unify the two viewpoints in a Deep Learning Inference Stack and take an across-stack approach by implementing and evaluating the most common neural network compression techniques (weight  ...  pruning, channel pruning, and quantisation) and optimising their parallel execution with a range of programming approaches (OpenMP, OpenCL) and hardware architectures (CPU, GPU).  ...  The opinions expressed and arguments employed herein do not necessarily reflect the official views of these funding bodies.  ... 
arXiv:1809.07196v1 fatcat:wxevr5hprveiro5lg2aie5nnem

NeuroHSMD: Neuromorphic Hybrid Spiking Motion Detector [article]

Pedro Machado, Andreas Oikonomou
2022 arXiv   pre-print
The Neuromorphic Hybrid Spiking Motion Detector (NeuroHSMD) proposed in this work accelerates the HSMD algorithm using Field-Programmable Gate Arrays (FPGAs).  ...  The NeuroHSMD algorithm was compared against the HSMD algorithm, using the same 2012 change detection (CDnet2012) and 2014 change detection (CDnet2014) benchmark datasets.  ...  ACKNOWLEDGMENT The authors would like to acknowledge the contributions given by Professor Martin McGinnity (technical and scientific insights), Professor Ahmad Lotfi (technical advice, comments and suggestions  ... 
arXiv:2112.06102v2 fatcat:7cxcx755hfaubp7i3bntt5xzca
« Previous Showing results 1 — 15 out of 336 results