A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Hardware Acceleration of Computer Vision and Deep Learning Algorithms on the Edge using OpenCL
2019
EAI Endorsed Transactions on Cloud Systems
This work proposes a low-cost, scalable, compute-at-the-edge solution using FPGA and OpenCL. ...
The paper proposes a methodology that can be used to accelerate traditional as well as machine learning based computer vision algorithms. ...
We would like to thank our colleagues at Intel Bangalore and Intel Penang who supported this activity ...
doi:10.4108/eai.5-11-2019.162597
fatcat:2grwelq3dvft3f7xot47pbdwte
A Unified Optimization Approach for CNN Model Inference on Integrated GPUs
2019
Proceedings of the 48th International Conference on Parallel Processing - ICPP 2019
The first author appreciates the advice of Prof. John D. ...
ACKNOWLEDGMENT The authors thank the anonymous reviewers of the paper for valuable comments. ...
commonly used computer vision models on popular edge devices? ...
doi:10.1145/3337821.3337839
dblp:conf/icpp/WangCLWZLW19
fatcat:ptvsneujwjdmhesvcrune7rqwy
Energy-efficient FPGA Implementation of the k-Nearest Neighbors Algorithm Using OpenCL
2016
Position Papers of the 2016 Federated Conference on Computer Science and Information Systems
Multiple fairly different implementations of the algorithm are considered and their performance on FPGA and GPU is compared. ...
Modern SoCs are getting increasingly heterogeneous with a combination of multi-core architectures and hardware accelerators to speed up the execution of computeintensive tasks at considerably lower power ...
This work is also supported in part by the European Commission through the ECOSCALE project (H2020-ICT-671632). ...
doi:10.15439/2016f327
dblp:conf/fedcsis/MuslimDMLQ16
fatcat:c7gspjezb5ek3dx2hmudvkedkm
An FPGA-based architecture for embedded systems performance acceleration applied to Optimum-Path Forest classifier
2017
Microprocessors and microsystems
Acknowledgment This work is supported by the PDSE program of Coordination for Improvement of High Education Personnel (CAPES) of Brazilian Ministry of Education, process 680 nº13077/2013-09 and carried ...
out in the framework of the Labex MS2T, funded by the French Government through the program "Investments for the future" managed by the National Agency for Research (Reference ANR-11-IDEX-0004-02). ...
Its implementation of a computer vision application algorithm in a SoC/FPGA board using the OpenCL language and workflow is also presented. ...
doi:10.1016/j.micpro.2017.06.013
fatcat:sp3p5ai2ovdv7awrslnbswe2te
Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs
[article]
2016
arXiv
pre-print
Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. ...
In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. ...
ACKNOWLEDGMENT This work has been partially supported by Samsung Advance Institute of Technology, and by Xilinx and Altera University Programs, through platform donations. ...
arXiv:1609.09296v1
fatcat:fgowcrakozdmxoaq4eoutnwlvy
Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications
2020
IEEE Transactions on Biomedical Circuits and Systems
With the advent of dedicated Deep Learning (DL) accelerators and neuromorphic processors, new opportunities are emerging for applying deep and Spiking Neural Network (SNN) algorithms to healthcare and ...
adoption of these tools, as we shed light on the future of deep networks and spiking neuromorphic processing systems. ...
large ADC power consumption for computer vision tasks which rely on deep networks and millions of parameters, such as VGG-16. ...
doi:10.1109/tbcas.2020.3036081
pmid:33156792
fatcat:rjwfjd7vmvglpk762mqeyiteqq
FeCaffe: FPGA-enabled Caffe with OpenCL for Deep Learning Training and Inference on Intel Stratix 10
[article]
2019
arXiv
pre-print
FPGA-enabled Caffe, a hierarchical software and hardware design methodology based on the Caffe to enable FPGA to support mainline deep learning development features, e.g. training and inference with Caffe ...
Currently, there are many popular frameworks in the market for deep learning development, such as Caffe, TensorFlow, Pytorch, and most of frameworks natively support CPU and consider GPU as the mainline ...
All of these actions are required by using the framework so that deep learning algorithm developers can focus on algorithm development only with ease. ...
arXiv:1911.08905v1
fatcat:k727mudp3neutbj7nnwbzvfk6a
Accelerating Deep Neural Networks implementation: A survey
2021
IET Computers & Digital Techniques
Deploying such Deep Neural Networks (DNN) on embedded devices is still a challenging task considering the massive requirement of computation and storage. ...
Recently, Deep Learning (DL) applications are getting more and more involved in different fields. ...
Additionally, we exposed the tools that can automatically generate hardware design from software that are used for implementing and evaluating deep learning approaches. ...
doi:10.1049/cdt2.12016
fatcat:3kl4j5ztl5eahmgv7vetu2egay
MNN: A Universal and Efficient Inference Engine
[article]
2020
arXiv
pre-print
Deploying deep learning models on mobile devices draws more and more attention recently. ...
However, designing an efficient inference engine on devices is under the great challenges of model compatibility, device diversity, and resource limitation. ...
ACKNOWLEDGEMENTS We thank Chaoyue Niu for helpful discussions and the anonymous reviewers for their valuable comments to improve our work. ...
arXiv:2002.12418v1
fatcat:ppeykiv57nc6bfqa74lyzse3by
Guest Editorial: Special Issue on Embedded Computer Vision
2018
Journal of Signal Processing Systems
We present papers describing a range of novel solutions: a deep learning accelerator, a robust aerial tracking system, an FPGA-based aerial visual servoing task solution, an approach to use low-cost hardware ...
We are pleased to include six state-of-the-art papers from the leaders in this field, both from industry and academia, who keep pushing the embedded computer vision technology forward. ...
In their paper "Efficient Object Detection Using Georgia Tech) discuss their research on energy efficient deep learning accelerators and their application on vision-based object detection. ...
doi:10.1007/s11265-018-1365-8
fatcat:mbpznn6mzfeuvhphuznn4kb6wy
Hardware Implementation of Deep Network Accelerators Towards Healthcare and Biomedical Applications
[article]
2020
arXiv
pre-print
With the advent of dedicated Deep Learning (DL) accelerators and neuromorphic processors, new opportunities are emerging for applying deep and Spiking Neural Network (SNN) algorithms to healthcare and ...
adoption of these tools, as we shed light on the future of deep networks and spiking neuromorphic processing systems as proponents for driving biomedical circuits and systems forward. ...
large ADC power consumption for computer vision tasks which rely on deep networks and millions of parameters, such as VGG-16. ...
arXiv:2007.05657v1
fatcat:amqutl3suvgq5nygna4ef36usy
Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks
2018
2018 IEEE International Symposium on Workload Characterization (IISWC)
In this paper we unify the two viewpoints in a Deep Learning Inference Stack and take an across-stack approach by implementing and evaluating the most common neural network compression techniques (weight ...
pruning, channel pruning, and quantisation) and optimising their parallel execution with a range of programming approaches (OpenMP, OpenCL) and hardware architectures (CPU, GPU). ...
The opinions expressed and arguments employed herein do not necessarily reflect the official views of these funding bodies. ...
doi:10.1109/iiswc.2018.8573503
dblp:conf/iiswc/TurnerCRCOS18
fatcat:hxxhuovm6fhyhheg55vtwyvsoi
EDSSA: An Encoder-Decoder Semantic Segmentation Networks Accelerator on OpenCL-Based FPGA Platform
2020
Sensors
We introduce the related technologies, architecture design, algorithm optimization, and hardware implementation of the Encoder-Decoder semantic segmentation network SegNet as an example, and undertake ...
the FPGA platforms that support Open Computing Language (OpenCL) development. ...
Conflicts of Interest: The authors declare no conflict of interest. Sensors 2020, 20, 3969 ...
doi:10.3390/s20143969
pmid:32708851
fatcat:bfzh6ou5djdf5j46734bnbdjgy
Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks
[article]
2018
arXiv
pre-print
In this paper we unify the two viewpoints in a Deep Learning Inference Stack and take an across-stack approach by implementing and evaluating the most common neural network compression techniques (weight ...
pruning, channel pruning, and quantisation) and optimising their parallel execution with a range of programming approaches (OpenMP, OpenCL) and hardware architectures (CPU, GPU). ...
The opinions expressed and arguments employed herein do not necessarily reflect the official views of these funding bodies. ...
arXiv:1809.07196v1
fatcat:wxevr5hprveiro5lg2aie5nnem
NeuroHSMD: Neuromorphic Hybrid Spiking Motion Detector
[article]
2022
arXiv
pre-print
The Neuromorphic Hybrid Spiking Motion Detector (NeuroHSMD) proposed in this work accelerates the HSMD algorithm using Field-Programmable Gate Arrays (FPGAs). ...
The NeuroHSMD algorithm was compared against the HSMD algorithm, using the same 2012 change detection (CDnet2012) and 2014 change detection (CDnet2014) benchmark datasets. ...
ACKNOWLEDGMENT The authors would like to acknowledge the contributions given by Professor Martin McGinnity (technical and scientific insights), Professor Ahmad Lotfi (technical advice, comments and suggestions ...
arXiv:2112.06102v2
fatcat:7cxcx755hfaubp7i3bntt5xzca
« Previous
Showing results 1 — 15 out of 336 results