A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
TIRAMISU: A Polyhedral Compiler for Dense and Sparse Deep Learning
[article]
2020
arXiv
pre-print
In this paper, we demonstrate a compiler that can optimize sparse and recurrent neural networks, both of which are currently outside of the scope of existing neural network compilers (sparse neural networks ...
We evaluate our approach on a set of deep learning benchmarks and compare our results with hand-optimized industrial libraries. ...
The CPU evaluation is performed on an 8-core Intel i7-6700HQ CPU, 16 GB RAM, Ubuntu 18.04. The GPU evaluation is performed on an Nvidia Pascal P4 GPU. ...
arXiv:2005.04091v1
fatcat:zqeblrvhqjh6xjy6i6nquualza
Sparse GPU Kernels for Deep Learning
[article]
2020
arXiv
pre-print
While deep neural networks can be made sparse, achieving practical speedups on GPUs is difficult because these applications have relatively moderate levels of sparsity that are not sufficient for existing ...
Based on these insights, we develop high-performance GPU kernels for two sparse matrix operations widely applicable in neural networks: sparse matrix-dense matrix multiplication and sampled dense-dense ...
ACKNOWLEDGEMENTS We are grateful to Rasmus Larsen and Deepak Narayanan for providing detailed feedback on drafts of this paper. ...
arXiv:2006.10901v2
fatcat:76wdsepdlffslgz3kkuxykwv5i
Deep Learning for Consumer Devices and Services: Pushing the limits for machine learning, artificial intelligence, and computer vision
2017
IEEE Consumer Electronics Magazine
The thing we want our network to learn to do is called the "task". When training Artificial Neural Networks, we want the network to perform well at a given task on unseen information. ...
A typical GPU has hundreds or thousands of cores, and although each core is much slower than a typical CPU core, together they are able to train networks (especially deep neural networks) at the level ...
doi:10.1109/mce.2016.2640698
fatcat:k4bdd7zvbrckjle2ni7ck4pxq4
Truly Sparse Neural Networks at Scale
[article]
2022
arXiv
pre-print
All in one, we are able to break the record and to train the largest neural network ever trained in terms of representational power -- reaching the bat brain size. ...
In this paper, we take an orthogonal approach, and we show that we can train truly sparse neural networks to harvest their full potential. ...
Acknowledgement We thank the Google Cloud Platform Research Credits program for granting us the necessary resources to run the Extreme large sparse MLPs experiments. ...
arXiv:2102.01732v2
fatcat:xw4pnoj5zfafvilmk34odczt5m
Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems
[article]
2020
arXiv
pre-print
All the neural network training computations are contained in GPUs. Extensive experiments on real-world data confirm the effectiveness and the scalability of the proposed system. ...
Neural networks of ads systems usually take input from multiple resources, e.g., query-ad relevance, ad features and user portraits. ...
Data transferring and SSD I/Os bandwidth are relatively slow compared with the deep neural network training on GPUs. ...
arXiv:2003.05622v1
fatcat:kfl2uv7oarfsfa7zpkgps76h6e
Scaling the training of particle classification on simulated MicroBooNE events to multiple GPUs
[article]
2020
arXiv
pre-print
However, such efforts lead to extremely long training cycles, which slow down the exploration of new network architectures and hyperparameter scans to improve the classification performance. ...
Ideally, training would occur on many instances of the entire event data, instead of many instances of cropped regions of interest from the event data. ...
Acknowledgements The authors gratefully acknowledge the MicroBooNE collaboration for permission to work on simulated LArTPC data to focus on compute resources and performance scaling. ...
arXiv:2004.08439v1
fatcat:5quss4o7xzdbvkczb5r6zg3rlq
Doing Scientific Machine Learning with Julia's SciML Ecosystem
[article]
2020
figshare.com
Equations for Scientific Machine Learning](https://arxiv.org/abs/2001.04385)), Physics-Informed Neural Networks ([Physics-informed neural networks: A deep learning framework for solving forward and inverse ...
how to model the missing part of a physical simulation, describe how universal approximators (neural networks) can be used in this context, and show how to transform such problems into an optimization ...
Neural Networks: Deep
Learning of High-dimensional Partial Differential Equations
Maziar Raissi
UDEs are a BLAS/LAPACK of SciML
Scientific Machine Learning requires efficient
and accurate training ...
doi:10.6084/m9.figshare.12751949.v1
fatcat:3nodxm7ghzftflwbmlrtnhf5tu
Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks
[article]
2019
arXiv
pre-print
The employment of high-performance servers and GPU accelerators for training deep neural network models have greatly accelerated recent advances in deep learning (DL). ...
Experiments show that Parallax built atop TensorFlow achieves scalable training throughput on both dense and sparse models while requiring little effort from its users. ...
Introduction It is a common practice nowadays for deep learning (DL) practitioners to utilize a cluster of GPU resources for training deep neural networks. ...
arXiv:1808.02621v3
fatcat:flymv2t6lnh23pkzivelnjby44
Winning the Lottery Ahead of Time: Efficient Early Network Pruning
[article]
2022
arXiv
pre-print
Pruning, the task of sparsifying deep neural networks, received increasing attention recently. ...
This enables us to train sparse networks on commodity GPUs whose dense versions would be too large, thereby saving costs and reducing hardware requirements. ...
Due to its decreased performance on large network/dataset combinations, the LTH was later revised for very deep networks. ...
arXiv:2206.10451v1
fatcat:et3yzojg3bhzzcoc6jpmnydtqu
A Review of Deep Learning Research
2019
KSII Transactions on Internet and Information Systems
processing, speech recognition and online advertising and so on. ...
of deep learning; Finally, we introduce the latest acceleration technology of deep learning and highlight the future work of deep learning. ...
Acknowledgements We thank the anonymous referees for their helpful comments and suggestions on the initial version of this paper. ...
doi:10.3837/tiis.2019.04.001
fatcat:tefkvk3fvvanbkzwmjn44eoxsu
On optimization methods for deep learning
2011
International Conference on Machine Learning
Our experiments with distributed optimization support the use of L-BFGS with locally connected networks and convolutional neural networks. ...
The predominant methodology in training deep learning advocates the use of stochastic gradient descent methods (SGDs). Despite its ease of implementation, SGDs are difficult to tune and parallelize. ...
This work is supported by the DARPA Deep Learning program under contract number FA8650-10-C-7020. ...
dblp:conf/icml/LeNCLPN11
fatcat:s4m4aokdevd6dc5lumiuqulnvu
A New Approach for Sparse Matrix Classification Based on Deep Learning Techniques
2018
2018 IEEE International Conference on Cluster Computing (CLUSTER)
Considering GPUs as target platforms, the trained CNN selects the best storage format 90.1% of the time, obtaining 99.4% of the highest SpMV performance among the tested formats. ...
As a consequence, we generate image datasets that include enough information to successfully train a Convolutional Neural Network (CNN). ...
DIGITS allows
to design, train and visualize deep neural networks
for image classification taking advantage of the deep
learning framework Caffe 3 . ...
doi:10.1109/cluster.2018.00017
dblp:conf/cluster/PichelP18
fatcat:fanbx4zzgjhzvjfsxto7oywg7u
Bridging the Gap between Memory and Communication Efficiency on Distributed Deep Learning Systems
2021
IEEE Access
Compared with baseline systems using only a single strategy, LaySA can help to reduce the system memory usage by up to 80.5%, and the overall training time of the neural network models on a single GPU ...
utilization of multiple resources simultaneously, especially for extreme-scale deep neural networks. ...
Furthermore, the larger the Deep Neural VOLUME 4, 2016 Network (DNN), the more resources the system needs. ...
doi:10.1109/access.2021.3071579
fatcat:bnfyud7ih5cfnbcsuwwq4ehhn4
Improving Neural Network with Uniform Sparse Connectivity
2020
IEEE Access
Neural network forms the foundation of deep learning and numerous AI applications. Classical neural networks are fully connected, expensive to train and prone to overfitting. ...
USN has one striking property that its performance is independent of the substantial topology variation and enormous model space, thus offers a search-free solution to all above mentioned issues of neural ...
INTRODUCTION Neural network (NN) or artificial neural network (ANN) is one of the most popular machine learning (ML) frameworks, and form the foundation of most artificial intelligence (AI) and deep learning ...
doi:10.1109/access.2020.3040943
fatcat:pcsltcqrdnhq3cqrcu6mt3vzpi
Large-Scale Shape Retrieval with Sparse 3D Convolutional Neural Networks
[article]
2017
arXiv
pre-print
In this paper we present results of performance evaluation of S3DCNN - a Sparse 3D Convolutional Neural Network - on a large-scale 3D Shape benchmark ModelNet40, and measure how it is impacted by voxel ...
We also notice that benefits of higher input resolution can be limited by an ability of a neural network to generalize high level features. ...
In this work, we present Sparse 3D Deep Convolutional Neural Networks and explore their ability to perform large-scale shape retrieval on the popular benchmark ModelNet40 [22] depending on an input resolution ...
arXiv:1611.09159v2
fatcat:2px6e4vgzjggtgcto4hpuo4zy4
« Previous
Showing results 1 — 15 out of 27,077 results