Filters








4,250 Hits in 4.5 sec

Shift-based Primitives for Efficient Convolutional Neural Networks [article]

Huasong Zhong, Xianggen Liu, Yihui He, Yuchun Ma
2018 arXiv   pre-print
We propose a collection of three shift-based primitives for building efficient compact CNN-based networks.  ...  We blend these shift-based primitives with point-wise group convolution and built two inference-efficient CNN architectures named AddressNet and Enhanced AddressNet.  ...  Figure 1 . 1 Three efficient shift primitives for efficient neural network architecture design Figure 4 . 4 Implementation of feature map right shift by depthwise convolution, proposed by [34] , where  ... 
arXiv:1809.08458v2 fatcat:vowwiwbpnbgetogxiobrofsgqq

cuDNN: Efficient Primitives for Deep Learning [article]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, Evan Shelhamer
2014 arXiv   pre-print
For example, integrating cuDNN into Caffe, a popular framework for convolutional networks, improves performance by 36% on a standard model while also reducing memory consumption.  ...  We present a library of efficient implementations of deep learning primitives. Deep learning workloads are computationally intensive, and optimizing their kernels is difficult and time-consuming.  ...  Spatial Convolutions The most important computational primitive in convolutional neural networks is a special form of batched convolution.  ... 
arXiv:1410.0759v3 fatcat:ps3tknetujeudogmgl6wnpp5wq

A Tetrahedron-Based Heat Flux Signature for Cortical Thickness Morphometry Analysis [chapter]

Yonghui Fan, Gang Wang, Natasha Lepore, Yalin Wang
2018 Lecture Notes in Computer Science  
Segmentation 249 A Feature-Driven Active Framework for Ultrasound-Based Brain Shift Compensation 264 Esophageal Gross Tumor Volume Segmentation using a 3D Convolutional Neural Network 274 Cardiac MR Segmentation  ...  645 3D Context Enhanced Region-based Convolutional Neural Network for End-to-End Lesion Detection 646 A Comprehensive Approach for Learning-based Fully-Automated Inter-slice Motion Correction for Short-Axis  ... 
doi:10.1007/978-3-030-00931-1_48 pmid:30338317 pmcid:PMC6191198 fatcat:dqhvpm5xzrdqhglrfftig3qejq

Evolutionary Neural Architecture Search for Image Restoration [article]

Gerard Jacques van Wyk, Anna Sergeevna Bosman
2019 arXiv   pre-print
Convolutional neural network (CNN) architectures have traditionally been explored by human experts in a manual search process that is time-consuming and ineffectively explores the massive space of potential  ...  This paper proposes a NAS method that performs computationally efficient evolutionary search of a minimally constrained network architecture search space.  ...  Convolutional primitive: For the convolutional primitive, another divergence was made from previous NAS research.  ... 
arXiv:1812.05866v2 fatcat:n6rnm7zltbadhpoxze7gyafgpy

Automatic generation of specialized direct convolutions for mobile GPUs

Naums Mogers, Valentin Radu, Lu Li, Jack Turner, Michael O'Boyle, Christophe Dubach
2020 Proceedings of the 13th Annual Workshop on General Purpose Processing using Graphics Processing Unit  
Convolutional Neural Networks (CNNs) are a powerful and versatile tool for performing computer vision tasks in both resource constrained settings and server-side applications.  ...  Using Lift, we show that it is possible to generate automatically code that is ×10 faster than the direct convolution while using ×3.6 less space than the GEMM-based convolution of the very specialized  ...  Acknowledgments This work was supported by the Engineering and Physical Sciences Research Council (grant EP/L01503X/1), EPSRC Centre for Doctoral Training in Pervasive Parallelism at the University of  ... 
doi:10.1145/3366428.3380771 dblp:conf/ppopp/MogersRLTOD20 fatcat:342savoeijb3zaznujfmhptoku

Low‐complexity neuron for fixed‐point artificial neural networks with ReLU activation function in energy‐constrained wireless applications

Wen‐Long Chin, Qinyu Zhang, Tao Jiang
2021 IET Communications  
This work introduces an efficient neuron design for fixed-point artificial neural networks with the rectified linear unit (ReLU) activation function for energy-constrained wireless applications.  ...  A comparison of the proposed algorithm with the popular 16-bit fixed-point format of the convolutional network, AlexNet, indicates that the computation can be saved by 48.58% as well.  ...  ACKNOWLEDGEMENTS The authors would like to thank the editor and reviewers for their helpful comments in improving the quality of this paper.  ... 
doi:10.1049/cmu2.12129 fatcat:l52xiiaaqrff5mjhjg2w64bx44

Cleanup Sketched Drawings: Deep Learning-Based Model

Amal Ahmed Hasan Mohammed, Jiazhou Chen, Fahd Abd Algalil
2022 Applied Bionics and Biomechanics  
This research paper proposes using a fully convolutional network (FCNN) model to simplify rough raster drawings using deep learning.  ...  For evaluating the results, the mean squared error (MSE) metric was used.  ...  Techniques based on convolutional neural networks are used in Mastering Sketching  ... 
doi:10.1155/2022/2238077 pmid:35578715 pmcid:PMC9107365 fatcat:5wjndf4nbfhfva3wrjrroku6vy

APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores [article]

Boyuan Feng, Yuke Wang, Tong Geng, Ang Li, Yufei Ding
2021 arXiv   pre-print
Over the years, accelerating neural networks with quantization has been widely studied.  ...  To break such restrictions, we introduce the first Arbitrary Precision Neural Network framework (APNN-TC) to fully exploit quantization benefits on Ampere GPU Tensor Cores.  ...  ACKNOWLEDGEMENTS We thank all anonymous reviewers for their valuable comments.  ... 
arXiv:2106.12169v2 fatcat:37is2mdkjbgbblyoki62yls7by

Image Compression Based on Deep Learning: A Review

Hajar Maseeh Yasin, Adnan Mohsin Abdulazeez
2021 Asian Journal of Research in Computer Science  
Many neural networks are required for image compressions, such as deep neural networks, artificial neural networks, recurrent neural networks, and convolution neural networks.  ...  Image compression is an essential technology for encoding and improving various forms of images in the digital era.  ...  Presents context-based convolutional networks (CCNs) to be precise and efficient.  ... 
doi:10.9734/ajrcos/2021/v8i130193 fatcat:2fe4mfuvbffwphwtnxmay74oem

Guest Editorial: IEEE TC Special Issue On Smart Edge Computing and IoT

Luca Benini, Simone Benatti, Taekwang Jang, Abbas Rahimi
2021 IEEE transactions on computers  
The proposed system, including the fully convolutional neural networks optimizes the detection quality and frame rate, achieving 0.96 ROC AUC in the hogweed segmentation task using 0.46 frame per second  ...  For this reason, both architectural optimizations and neural network performance tuning and analysis tools are important tools for the evolution of next generation edge devices.  ...  The paper entitled "Distributed Deep Convolutional Neural Networks for the Internet-of-Things" by Disabato et al. introduces a design methodology for allocating the execution of CNNs on a distributed IoT  ... 
doi:10.1109/tc.2021.3082675 fatcat:ffx3cnnozbbivf5zokiil62irq

Mixed Precision Training of Convolutional Neural Networks using Integer Operations [article]

Dipankar Das, Naveen Mellempudi, Dheevatsa Mudigere, Dhiraj Kalamkar, Sasikanth Avancha, Kunal Banerjee, Srinivas Sridharan, Karthik Vaidyanathan, Bharat Kaul, Evangelos Georganas, Alexander Heinecke, Pradeep Dubey, Jesus Corbal, Nikita Shustrov (+3 others)
2018 arXiv   pre-print
common neural network operations.  ...  The nuances of developing an efficient integer convolution kernel is examined, including methods to handle overflow of the INT32 accumulator.  ...  ACKNOWLEDGMENTS The authors would like to thank the Intel CRT-DC team that operates the Endeavor cluster and also the Excalibur cluster team for their outstanding support and assistance.  ... 
arXiv:1802.00930v2 fatcat:dscno2bvzvei7na4cnhxn33hsa

GhostShiftAddNet: More Features from Energy-Efficient Operations [article]

Jia Bi, Jonathon Hare, Geoff V. Merrett
2022 arXiv   pre-print
Deep convolutional neural networks (CNNs) are computationally and memory intensive.  ...  We schedule the number of bit-shift and addition operations for different hardware platforms.  ...  Introduction Deep Convolutional Neural Networks (CNNs) have become more accurate and faster for image classification applications with large image datasets.  ... 
arXiv:2109.09495v3 fatcat:deyquecwrjbg7k7uozix4svxva

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han
2022 ACM Transactions on Design Automation of Electronic Systems  
We then cover efficient on-device training to enable user customization based on the local data on mobile devices.  ...  Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing, and speech recognition.  ...  Efficient NLP Primitive.  ... 
doi:10.1145/3486618 fatcat:h6xwv2slo5eklift2fl24usine

Flexible Acceleration of Convolutions on FPGAs: NEURAghe 2.0

Marco Carreras, Gianfranco Deriu, Paolo Meloni
2019 CPS Summer School  
Convolutional Neural Networks are commonly employed in applications involving Computer Vision tasks like image/video classification/recognition/segmentation.  ...  In multiple scenarios a convolution approach applied on the time dimension, hereafter called Temporal Convolution Network (TCN) can outperform classic strategies relying on recurrent networks in terms  ...  To our knowledge there are no FPGA-based architectures that specifically tackled the problem of hardware acceleration for sequences processed by Temporal Convolutional Neural Networks.  ... 
dblp:conf/cpsschool/CarrerasDM19 fatcat:4zu4me7pxna4rnfyjh555zfi5y

Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs [article]

Miguel de Prado, Andrew Mundy, Rabia Saeed, Maurizio Denna, Nuria Pazos, Luca Benini
2020 arXiv   pre-print
The spread of deep learning on embedded devices has prompted the development of numerous methods to optimise the deployment of deep neural networks (DNN).  ...  Works have mainly focused on: i) efficient DNN architectures, ii) network optimisation techniques such as pruning and quantisation, iii) optimised algorithms to speed up the execution of the most computational  ...  elements of primitive design which can be combined to build optimised kernels to implement neural network algorithms.  ... 
arXiv:2006.05181v2 fatcat:bfqm7genmngpxf3lzbpvh6fq3y
« Previous Showing results 1 — 15 out of 4,250 results