27,077 Hits in 5.4 sec

TIRAMISU: A Polyhedral Compiler for Dense and Sparse Deep Learning [article]

Riyadh Baghdadi, Abdelkader Nadir Debbagh, Kamel Abdous, Fatima Zohra Benhamida, Alex Renda, Jonathan Elliott Frankle, Michael Carbin, Saman Amarasinghe
2020 arXiv   pre-print
In this paper, we demonstrate a compiler that can optimize sparse and recurrent neural networks, both of which are currently outside of the scope of existing neural network compilers (sparse neural networks  ...  We evaluate our approach on a set of deep learning benchmarks and compare our results with hand-optimized industrial libraries.  ...  The CPU evaluation is performed on an 8-core Intel i7-6700HQ CPU, 16 GB RAM, Ubuntu 18.04. The GPU evaluation is performed on an Nvidia Pascal P4 GPU.  ... 
arXiv:2005.04091v1 fatcat:zqeblrvhqjh6xjy6i6nquualza

Sparse GPU Kernels for Deep Learning [article]

Trevor Gale, Matei Zaharia, Cliff Young, Erich Elsen
2020 arXiv   pre-print
While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because these applications have relatively moderate levels of sparsity that are not sufficient for existing  ...  Based on these insights, we develop high-performance GPU kernels for two sparse matrix operations widely applicable in neural networks: sparse matrix-dense matrix multiplication and sampled dense-dense  ...  ACKNOWLEDGEMENTS We are grateful to Rasmus Larsen and Deepak Narayanan for providing detailed feedback on drafts of this paper.  ... 
arXiv:2006.10901v2 fatcat:76wdsepdlffslgz3kkuxykwv5i

Deep Learning for Consumer Devices and Services: Pushing the limits for machine learning, artificial intelligence, and computer vision

Joe Lemley, Shabab Bazrafkan, Peter Corcoran
2017 IEEE Consumer Electronics Magazine  
The thing we want our network to learn to do is called the "task". When training Artificial Neural Networks, we want the network to perform well at a given task on unseen information.  ...  A typical GPU has hundreds or thousands of cores, and although each core is much slower than a typical CPU core, together they are able to train networks (especially deep neural networks) at the level  ... 
doi:10.1109/mce.2016.2640698 fatcat:k4bdd7zvbrckjle2ni7ck4pxq4

Truly Sparse Neural Networks at Scale [article]

Selima Curci, Decebal Constantin Mocanu, Mykola Pechenizkiyi
2022 arXiv   pre-print
All in one, we are able to break the record and to train the largest neural network ever trained in terms of representational power -- reaching the bat brain size.  ...  In this paper, we take an orthogonal approach, and we show that we can train truly sparse neural networks to harvest their full potential.  ...  Acknowledgement We thank the Google Cloud Platform Research Credits program for granting us the necessary resources to run the Extreme large sparse MLPs experiments.  ... 
arXiv:2102.01732v2 fatcat:xw4pnoj5zfafvilmk34odczt5m

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems [article]

Weijie Zhao, Deping Xie, Ronglai Jia, Yulei Qian, Ruiquan Ding, Mingming Sun, Ping Li
2020 arXiv   pre-print
All the neural network training computations are contained in GPUs. Extensive experiments on real-world data confirm the effectiveness and the scalability of the proposed system.  ...  Neural networks of ads systems usually take input from multiple resources, e.g., query-ad relevance, ad features and user portraits.  ...  Data transferring and SSD I/Os bandwidth are relatively slow compared with the deep neural network training on GPUs.  ... 
arXiv:2003.05622v1 fatcat:kfl2uv7oarfsfa7zpkgps76h6e

Scaling the training of particle classification on simulated MicroBooNE events to multiple GPUs [article]

Alex Hagen, Eric Church, Jan Strube, Kolahal Bhattacharya, Vinay Amatya
2020 arXiv   pre-print
However, such efforts lead to extremely long training cycles, which slow down the exploration of new network architectures and hyperparameter scans to improve the classification performance.  ...  Ideally, training would occur on many instances of the entire event data, instead of many instances of cropped regions of interest from the event data.  ...  Acknowledgements The authors gratefully acknowledge the MicroBooNE collaboration for permission to work on simulated LArTPC data to focus on compute resources and performance scaling.  ... 
arXiv:2004.08439v1 fatcat:5quss4o7xzdbvkczb5r6zg3rlq

Doing Scientific Machine Learning with Julia's SciML Ecosystem [article]

Christopher Rackauckas
Equations for Scientific Machine Learning](, Physics-Informed Neural Networks ([Physics-informed neural networks: A deep learning framework for solving forward and inverse  ...  how to model the missing part of a physical simulation, describe how universal approximators (neural networks) can be used in this context, and show how to transform such problems into an optimization  ...  Neural Networks: Deep Learning of High-dimensional Partial Differential Equations Maziar Raissi UDEs are a BLAS/LAPACK of SciML Scientific Machine Learning requires efficient and accurate training  ... 
doi:10.6084/m9.figshare.12751949.v1 fatcat:3nodxm7ghzftflwbmlrtnhf5tu

Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks [article]

Soojeong Kim, Gyeong-In Yu, Hojin Park, Sungwoo Cho, Eunji Jeong, Hyeonmin Ha, Sanha Lee, Joo Seong Jeong, Byung-Gon Chun
2019 arXiv   pre-print
The employment of high-performance servers and GPU accelerators for training deep neural network models have greatly accelerated recent advances in deep learning (DL).  ...  Experiments show that Parallax built atop TensorFlow achieves scalable training throughput on both dense and sparse models while requiring little effort from its users.  ...  Introduction It is a common practice nowadays for deep learning (DL) practitioners to utilize a cluster of GPU resources for training deep neural networks.  ... 
arXiv:1808.02621v3 fatcat:flymv2t6lnh23pkzivelnjby44

Winning the Lottery Ahead of Time: Efficient Early Network Pruning [article]

John Rachwan, Daniel Zügner, Bertrand Charpentier, Simon Geisler, Morgane Ayle, Stephan Günnemann
2022 arXiv   pre-print
Pruning, the task of sparsifying deep neural networks, received increasing attention recently.  ...  This enables us to train sparse networks on commodity GPUs whose dense versions would be too large, thereby saving costs and reducing hardware requirements.  ...  Due to its decreased performance on large network/dataset combinations, the LTH was later revised for very deep networks.  ... 
arXiv:2206.10451v1 fatcat:et3yzojg3bhzzcoc6jpmnydtqu

A Review of Deep Learning Research

2019 KSII Transactions on Internet and Information Systems  
processing, speech recognition and online advertising and so on.  ...  of deep learning; Finally, we introduce the latest acceleration technology of deep learning and highlight the future work of deep learning.  ...  Acknowledgements We thank the anonymous referees for their helpful comments and suggestions on the initial version of this paper.  ... 
doi:10.3837/tiis.2019.04.001 fatcat:tefkvk3fvvanbkzwmjn44eoxsu

On optimization methods for deep learning

Quoc V. Le, Jiquan Ngiam, Adam Coates, Ahbik Lahiri, Bobby Prochnow, Andrew Y. Ng
2011 International Conference on Machine Learning  
Our experiments with distributed optimization support the use of L-BFGS with locally connected networks and convolutional neural networks.  ...  The predominant methodology in training deep learning advocates the use of stochastic gradient descent methods (SGDs). Despite its ease of implementation, SGDs are difficult to tune and parallelize.  ...  This work is supported by the DARPA Deep Learning program under contract number FA8650-10-C-7020.  ... 
dblp:conf/icml/LeNCLPN11 fatcat:s4m4aokdevd6dc5lumiuqulnvu

A New Approach for Sparse Matrix Classification Based on Deep Learning Techniques

Juan C. Pichel, Beatriz Pateiro-Lopez
2018 2018 IEEE International Conference on Cluster Computing (CLUSTER)  
Considering GPUs as target platforms, the trained CNN selects the best storage format 90.1% of the time, obtaining 99.4% of the highest SpMV performance among the tested formats.  ...  As a consequence, we generate image datasets that include enough information to successfully train a Convolutional Neural Network (CNN).  ...  DIGITS allows to design, train and visualize deep neural networks for image classification taking advantage of the deep learning framework Caffe 3 .  ... 
doi:10.1109/cluster.2018.00017 dblp:conf/cluster/PichelP18 fatcat:fanbx4zzgjhzvjfsxto7oywg7u

Bridging the Gap between Memory and Communication Efficiency on Distributed Deep Learning Systems

Shaofeng Zhao, Bo Liu, Fang Wang, Dan Feng
2021 IEEE Access  
Compared with baseline systems using only a single strategy, LaySA can help to reduce the system memory usage by up to 80.5%, and the overall training time of the neural network models on a single GPU  ...  utilization of multiple resources simultaneously, especially for extreme-scale deep neural networks.  ...  Furthermore, the larger the Deep Neural VOLUME 4, 2016 Network (DNN), the more resources the system needs.  ... 
doi:10.1109/access.2021.3071579 fatcat:bnfyud7ih5cfnbcsuwwq4ehhn4

Improving Neural Network with Uniform Sparse Connectivity

Weijun Luo
2020 IEEE Access  
Neural network forms the foundation of deep learning and numerous AI applications. Classical neural networks are fully connected, expensive to train and prone to overfitting.  ...  USN has one striking property that its performance is independent of the substantial topology variation and enormous model space, thus offers a search-free solution to all above mentioned issues of neural  ...  INTRODUCTION Neural network (NN) or artificial neural network (ANN) is one of the most popular machine learning (ML) frameworks, and form the foundation of most artificial intelligence (AI) and deep learning  ... 
doi:10.1109/access.2020.3040943 fatcat:pcsltcqrdnhq3cqrcu6mt3vzpi

Large-Scale Shape Retrieval with Sparse 3D Convolutional Neural Networks [article]

Alexandr Notchenko, Ermek Kapushev, Evgeny Burnaev
2017 arXiv   pre-print
In this paper we present results of performance evaluation of S3DCNN - a Sparse 3D Convolutional Neural Network - on a large-scale 3D Shape benchmark ModelNet40, and measure how it is impacted by voxel  ...  We also notice that benefits of higher input resolution can be limited by an ability of a neural network to generalize high level features.  ...  In this work, we present Sparse 3D Deep Convolutional Neural Networks and explore their ability to perform large-scale shape retrieval on the popular benchmark ModelNet40 [22] depending on an input resolution  ... 
arXiv:1611.09159v2 fatcat:2px6e4vgzjggtgcto4hpuo4zy4
« Previous Showing results 1 — 15 out of 27,077 results