Filters








370 Hits in 7.6 sec

An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution

Bing Liu, Danyin Zou, Lei Feng, Shou Feng, Ping Fu, Junbao Li
2019 Electronics  
Based on the ZYNQ heterogeneous platform and the coordination of resource and bandwidth issues with the roofline model, the CNN accelerator we designed can accelerate both standard convolution and depthwise  ...  The experimental results show that the accelerator designed in this paper can achieve 17.11GOPS for 32bit floating point when it can also accelerate depthwise separable convolution, which has obvious advantages  ...  Acknowledgments: The authors would like to thank the Editor and the anonymous reviewers for their valuable comments and suggestions.  ... 
doi:10.3390/electronics8030281 fatcat:mx4esrhr7zhmpfjd6gtbdsc3x4

DeepDive: An Integrative Algorithm/Architecture Co-Design for Deep Separable Convolutional Neural Networks [article]

Mohammadreza Baharani, Ushma Sunil, Kaustubh Manohar, Steven Furgurson, Hamed Tabkhi
2020 arXiv   pre-print
The execution results on Xilinx's ZCU102 FPGA board, demonstrate 47.4 and 233.3 FPS/Watt for MobileNet-V2 and a compact version of EfficientNet, respectively, as two state-of-the-art depthwise separable  ...  This paper introduces DeepDive, which is a fully-functional, vertical co-design framework, for power-efficient implementation of DSCNNs on edge FPGAs.  ...  To generate the optimized hardware for DSCNNs, the Network SoC Compiler uses pre-designed highly-optimized RTL micro-architectural blocks or synthesizable C++ model for depthwise, pointwise, and normal  ... 
arXiv:2007.09490v1 fatcat:n4o23g7rcjf33njjohv2vc26hm

A Low-Latency Inference of Randomly Wired Convolutional Neural Networks on an FPGA

Ryosuke KURAMOCHI, Hiroki NAKAHARA
2021 IEICE transactions on information and systems  
We propose an FPGA-based low-latency CNN inference for randomly wired convolutional neural networks (RWCNNs), whose layer structures are based on random graph models.  ...  Convolutional neural networks (CNNs) are widely used for image processing tasks in both embedded systems and data centers.  ...  for Evolutional Science and Technology (CREST), and the New Energy and Industrial Technology Development Organization (NEDO).  ... 
doi:10.1587/transinf.2021pap0010 fatcat:fr3ovpk34zhf5iaywqg6ddtg5a

Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets [article]

Daniel Haase, Manuel Amthor
2020 arXiv   pre-print
We introduce blueprint separable convolutions (BSConv) as highly efficient building blocks for CNNs.  ...  Moreover, our approach provides a thorough theoretical derivation, interpretation, and justification for the application of depthwise separable convolutions (DSCs) in general, which have become the basis  ...  Conclusions We introduced blueprint separable convolutions (BSConv) as highly efficient building blocks for CNNs.  ... 
arXiv:2003.13549v3 fatcat:sytecljsejd3tdorv2gze35thm

Efficient Semantic Segmentation Using Gradual Grouping

Nikitha Vallurupalli, Sriharsha Annamaneni, Girish Varma, CV Jawahar, Manu Mathew, Soyeb Nagori
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
We show that our proposed training method and efficient architecture design can improve accuracies by over 8% with depthwise separable convolutions applied on the encoder of ERFNet and attaching a light  ...  Deep CNNs for semantic segmentation have high memory and run time requirements. Various approaches have been proposed to make CNNs efficient like grouped, shuffled, depth-wise separable convolutions.  ...  Comparison with Depthwise separable, Groups and Shuffle Layers We study the effect of various efficient CNN designs on the ERFNet architecture for semantic segmentation.  ... 
doi:10.1109/cvprw.2018.00102 dblp:conf/cvpr/VallurupalliAVJ18 fatcat:d52ofiil2fbsnf3l4dmxalwqym

Efficient Semantic Segmentation using Gradual Grouping [article]

Nikitha Vallurupalli, Sriharsha Annamaneni, Girish Varma, C V Jawahar, Manu Mathew, Soyeb Nagori
2018 arXiv   pre-print
We show that our proposed training method and efficient architecture design can improve accuracies by over 8% with depth wise separable convolutions applied on the encoder of ERFNet and attaching a light  ...  Deep CNNs for semantic segmentation have high memory and run time requirements. Various approaches have been proposed to make CNNs efficient like grouped, shuffled, depth-wise separable convolutions.  ...  Comparison with Depthwise separable, Groups and Shuffle Layers We study the effect of various efficient CNN designs on the ERFNet architecture for semantic segmentation.  ... 
arXiv:1806.08522v1 fatcat:2i3c6oue3zgothkmtfarbblqdm

Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets

Daniel Haase, Manuel Amthor
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
We introduce blueprint separable convolutions (BSConv) as highly efficient building blocks for CNNs.  ...  Moreover, our approach provides a thorough theoretical derivation, interpretation, and justification for the application of depthwise separable convolutions (DSCs) in general, which have become the basis  ...  convolutions (BSConv) as highly efficient building blocks for CNNs.  ... 
doi:10.1109/cvpr42600.2020.01461 dblp:conf/cvpr/HaaseA20 fatcat:c2jozmnb5bdzdah6zcnerwipji

Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks [article]

Yihui He, Jianing Qian, Jianren Wang
2019 arXiv   pre-print
In this paper, we propose a novel decomposition approach based on SVD, namely depth-wise decomposition, for expanding regular convolutions into depthwise separable convolutions while maintaining high accuracy  ...  Recently, depth-wise separable convolution has been proposed for image recognition tasks on computationally limited platforms such as robotics and self-driving cars.  ...  For example, MobileNets [29] proposed a family of lightweight convolutional neural networks based on depthwise separable convolution.  ... 
arXiv:1910.09455v1 fatcat:gxzy3feejrfnlfwms32rcx3hge

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han
2022 ACM Transactions on Design Automation of Electronic Systems  
We then cover efficient on-device training to enable user customization based on the local data on mobile devices.  ...  To reduce the large design cost of these manual solutions, we discuss the AutoML framework for each of them, such as neural architecture search (NAS) and automated pruning and quantization.  ...  MobileNetV1 [125] is based on a building block called depthwise separable convo- lution, which consists of a 3×3 depthwise convolution layer and a 1×1 convolution layer.  ... 
doi:10.1145/3486618 fatcat:h6xwv2slo5eklift2fl24usine

DivNet: Efficient Convolutional Neural Network via Multilevel Hierarchical Architecture Design

Bachir Kaddar, Hadria Fizazi, Miguel Hernandez-Cabronero, Victor Sanchez, Joan Serra-Sagrista
2021 IEEE Access  
Designing small and efficient mobile neural networks is difficult because the challenge is to determine the architecture that achieves the best performance under a given limited computational scenario.  ...  dataset, and by 0.05%, 4.96%, and 1.13% on the CIFAR10 dataset.  ...  Second, based on DivMod, we propose DivNet, a hierarchical architecture that allows to construct efficient lightweight CNNs. This CNN model is particularly suitable for mobile designs.  ... 
doi:10.1109/access.2021.3099952 fatcat:dddvwfjekjhvldmgl2ekjl2com

Hello Edge: Keyword Spotting on Microcontrollers [article]

Yundong Zhang, Naveen Suda, Liangzhen Lai, Vikas Chandra
2018 arXiv   pre-print
We further explore the depthwise separable convolutional neural network (DS-CNN) and compare it against other neural network architectures.  ...  Keyword spotting (KWS) is a critical component for enabling speech based user interactions on smart devices. It requires real-time response and high accuracy for good user experience.  ...  We would also like to thank Pete Warden from Google's TensorFlow team for his valuable inputs and feedback on this project.  ... 
arXiv:1711.07128v3 fatcat:swrltzaqc5hvjay7ofrx3r4lwy

Guest Editors' Introduction to the Special Issue on Machine Learning Architectures and Accelerators

Xuehai Qian, Yanzhi Wang, Avinash Karanth
2020 IEEE transactions on computers  
Prior research has investigated software/algorithm optimization which includes DNN model architecture search for computation/storage reduction (e.g., depthwise-separate convolutions), model compression  ...  The training phase of the application is highly computation and data-intensive, and thus software/algorithm optimization as well as hardware acceleration re critically required.  ...  Further on, it is our pleasure to thank the Editor-in-Chief Ahmed Louri and Associate Editors Tao Li and James Hoe for their continuous help and support with all our organizational questions in connection  ... 
doi:10.1109/tc.2020.2997574 fatcat:vfng262tlvagrmfudtv44x75ly

SdcNet: A Computation-Efficient CNN for Object Recognition

Yunlong Ma, Chunyan Wang
2018 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP)  
Xception Xception [32] is a convolutional neural network architecture based entirely on depthwise separable convolution layers. The detailed architecture is shown in Fig 2. 9.  ...  XceptionNet is claimed to be the first CNN architecture based on entirely depthwise convolutions.  ... 
doi:10.1109/icdsp.2018.8631567 dblp:conf/icdsp/Ma018 fatcat:pcfwxso4ubef3omxkolshezc44

Embedded Intelligence on FPGA: Survey, Applications and Challenges

Kah Phooi Seng, Paik Jen Lee, Li Minn Ang
2021 Electronics  
This paper presents an overview and review of embedded intelligence on FPGA with a focus on applications, platforms and challenges.  ...  There are several challenges to be addressed to realize efficient EI implementations in hardware such as the need for: (1) high computational processing; (2) low power consumption (or high energy efficiency  ...  A depthwise separable CNN is actually a factorization from the traditional convolution to depthwise convolution and pointwise convolution.  ... 
doi:10.3390/electronics10080895 fatcat:igqk3n2kp5f4bmt6ho2qa3baau

Efficient Two-Stream Network for Violence Detection Using Separable Convolutional LSTM [article]

Zahidul Islam, Mohammad Rukonuzzaman, Raiyan Ahmed, Md. Hasanul Kabir, Moshiur Farazi
2021 arXiv   pre-print
In this work, we propose an efficient two-stream deep learning architecture leveraging Separable Convolutional LSTM (SepConvLSTM) and pre-trained MobileNet where one stream takes in background suppressed  ...  SepConvLSTM is constructed by replacing convolution operation at each gate of ConvLSTM with a depthwise separable convolution that enables producing robust long-range Spatio-temporal features while using  ...  In our work, we leveraged MobileNet [20] which is a lightweight 2D CNN that uses depthwise separable convolutions and clever design choices to develop a fast and efficient model geared towards mobile  ... 
arXiv:2102.10590v3 fatcat:7pkwozvzuzczdhehi3a7gcnaie
« Previous Showing results 1 — 15 out of 370 results