573 Hits in 4.2 sec

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks [article]

Xiaohan Ding, Guiguang Ding, Xiangxin Zhou, Yuchen Guo, Jungong Han, Ji Liu
2019 arXiv   pre-print
In this paper, we propose a novel momentum-SGD-based optimization method to reduce the network complexity by on-the-fly pruning.  ...  Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front-end devices.  ...  Acknowledgement We sincerely thank all the reviewers for their comments. This work was supported by the National Key  ... 
arXiv:1909.12778v3 fatcat:hh4wnkm2xvdj3nianidyliol6e

Truly Sparse Neural Networks at Scale [article]

Selima Curci, Decebal Constantin Mocanu, Mykola Pechenizkiyi
2022 arXiv   pre-print
To achieve this goal, we introduce three novel contributions, specially designed for sparse neural networks: (1) a parallel training algorithm and its corresponding sparse implementation from scratch,  ...  Recently, sparse training methods have started to be established as a de facto approach for training and inference efficiency in artificial neural networks. Yet, this efficiency is just in theory.  ...  Acknowledgement We thank the Google Cloud Platform Research Credits program for granting us the necessary resources to run the Extreme large sparse MLPs experiments.  ... 
arXiv:2102.01732v2 fatcat:xw4pnoj5zfafvilmk34odczt5m

Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces [article]

Yanwei Fu and Chen Liu and Donghao Li and Zuyuan Zhong and Xinwei Sun and Jinshan Zeng and Yuan Yao
2022 arXiv   pre-print
forward selection methods for learning structural sparsity in deep networks.  ...  The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability.  ...  For ResNet50, we can find an interesting phenomenon, most layers inside the block can be pruned to a very sparse level.  ... 
arXiv:1905.09449v5 fatcat:ac4ox2cojrha3nkksi4nqjxkxm

DessiLBI: Exploring Structural Sparsity of Deep Networks via Differential Inclusion Paths [article]

Yanwei Fu, Chen Liu, Donghao Li, Xinwei Sun, Jinshan Zeng, Yuan Yao
2020 arXiv   pre-print
Over-parameterization is ubiquitous nowadays in training neural networks to benefit both optimization in seeking global optima and generalization in reducing prediction error.  ...  Such a differential inclusion scheme has a simple discretization, proposed as Deep structurally splitting Linearized Bregman Iteration (DessiLBI), whose global convergence analysis in deep learning is  ...  Gradient descent finds global minima of deep neural networks. 2018. arXiv:1811.03804. 1 Zhang, Y.  ... 
arXiv:2007.02010v1 fatcat:sg2ozh6fijeqpmeuhovip6g6ee

Impact of Parameter Sparsity on Stochastic Gradient MCMC Methods for Bayesian Deep Learning [article]

Meet P. Vadera, Adam D. Cobb, Brian Jalaian, Benjamin M. Marlin
2022 arXiv   pre-print
Bayesian methods hold significant promise for improving the uncertainty quantification ability and robustness of deep neural network models.  ...  Recent research has seen the investigation of a number of approximate Bayesian inference methods for deep neural networks, building on both the variational Bayesian and Markov chain Monte Carlo (MCMC)  ...  A Appendix A.1 SGHMC in sparse neural networks In Algorithm 1, we present the SGHMC algorithm we use for neural networks with sparse substructure.  ... 
arXiv:2202.03770v1 fatcat:6vv6oku6irdwrdasnt7wpwpage

Directional Pruning of Deep Neural Networks [article]

Shih-Kang Chao, Zhanyu Wang, Yue Xing, Guang Cheng
2020 arXiv   pre-print
In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer  ...  SGD.  ...  This work was completed while Guang Cheng was a member of the Institute for Advanced Study, Princeton in the fall of 2019.  ... 
arXiv:2006.09358v2 fatcat:dadslgdh7becnnorheoilkkesa

Data-Driven Sparse Structure Selection for Deep Neural Networks [article]

Zehao Huang, Naiyan Wang
2018 arXiv   pre-print
Deep convolutional neural networks have liberated its extraordinary power on various tasks.  ...  In this paper, we propose a simple and effective framework to learn and prune deep models in an end-to-end manner.  ...  for deep neural networks.  ... 
arXiv:1707.01213v3 fatcat:shtsddglafdchdmqueylrkvunq

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks [article]

Jonathan Frankle, Michael Carbin
2019 arXiv   pre-print
Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising  ...  However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance.  ...  Channel pruning for accelerating very deep neural networks. In International Conference on Computer Vision (ICCV), volume 2, pp. 6, 2017. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean.  ... 
arXiv:1803.03635v5 fatcat:hycg3kxjqbdbbpqz2lq7l252ca

Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers [article]

Junjie Liu, Zhe Xu, Runbin Shi, Ray C. C. Cheung, Hayden K.H. So
2020 arXiv   pre-print
We demonstrate that our dynamic sparse training algorithm can easily train very sparse neural network models with little performance loss using the same number of training epochs as dense models.  ...  We present a novel network pruning algorithm called Dynamic Sparse Training that can jointly find the optimal network parameters and sparse network structure in a unified optimization process with trainable  ...  due to the over-parameterization of deep neural networks.  ... 
arXiv:2005.06870v1 fatcat:vtvnnw2smvbf5cqxtyhcuyin2a

Data-Driven Sparse Structure Selection for Deep Neural Networks [chapter]

Zehao Huang, Naiyan Wang
2018 Lecture Notes in Computer Science  
Deep convolutional neural networks have liberated its extraordinary power on various tasks.  ...  In this paper, we propose a simple and effective framework to learn and prune deep models in an end-to-end manner.  ...  for deep neural networks.  ... 
doi:10.1007/978-3-030-01270-0_19 fatcat:qgg6xbjmjbeytp5x7tju6dkvrq

Training Deep Neural Networks via Branch-and-Bound [article]

Yuanwei Wu, Ziming Zhang, Guanghui Wang
2021 arXiv   pre-print
A computationally efficient solver based on BPGrad has been proposed to train the deep neural networks.  ...  In this paper, we propose BPGrad, a novel approximate algorithm for deep nueral network training, based on adaptive estimates of feasible region via branch-and-bound.  ...  Earlier work on training neural networks [47] showed that it is difficult to find the global optima because in the worst case even learning a simple 3-node neural network is NP-complete.  ... 
arXiv:2104.01730v2 fatcat:wyzu7hguuba4ziwnxbwxhokcae

Dynamic Model Pruning with Feedback [article]

Tao Lin, Sebastian U. Stich, Luis Barba, Daniil Dmitriev, Martin Jaggi
2020 arXiv   pre-print
Deep neural networks often have millions of parameters.  ...  signal to reactivate prematurely pruned weights we obtain a performant sparse model in one single training pass (retraining is not needed, but can further improve the performance).  ...  INTRODUCTION Highly overparametrized deep neural networks show impressive results on machine learning tasks.  ... 
arXiv:2006.07253v1 fatcat:5t5ijn7hzfh3xiuy6hbsvjvdju

Sparse Weight Activation Training [article]

Md Aamir Raihan, Tor M. Aamodt
2020 arXiv   pre-print
Neural network training is computationally and memory intensive.  ...  Sparse training can reduce the burden on emerging hardware platforms designed to accelerate sparse computations, but it can affect network convergence.  ...  Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 1389-1397, 2017. [22] C. A. R. Hoare. Algorithm 65: Find.  ... 
arXiv:2001.01969v3 fatcat:twbzslvhgnh7jk4sauhbqk6ql4

LOss-Based SensiTivity rEgulaRization: towards deep sparse neural networks [article]

Enzo Tartaglione, Andrea Bragagnolo, Attilio Fiandrotti, Marco Grangetto
2020 arXiv   pre-print
LOBSTER (LOss-Based SensiTivity rEgulaRization) is a method for training neural networks having a sparse topology.  ...  Parameters with low sensitivity, i.e. having little impact on the loss when perturbed, are shrunk and then pruned to sparsify the network.  ...  In Sec. 2 we review the relevant literature concerning sparse neural architectures. Next, in Sec. 3 we describe our method for training a neural network such that its topology is sparse.  ... 
arXiv:2011.09905v1 fatcat:mkgomcochfda5iojhep4fn5dbi

A Bregman Learning Framework for Sparse Neural Networks [article]

Leon Bungert, Tim Roith, Daniel Tenbrinck, Martin Burger
2022 arXiv   pre-print
In contrast to established methods for sparse training the proposed family of algorithms constitutes a regrowth strategy for neural networks that is solely optimization-based without additional heuristics  ...  Our Bregman learning framework starts the training with very few initial parameters, successively adding only significant ones to obtain a sparse and expressive network.  ...  Additionally we thank for the financial support by the Cluster of Excellence "Engineering of Advanced Materials" (EAM) and the "Competence Unit for Scientific Computing" (CSC) at the University of Erlangen-Nürnberg  ... 
arXiv:2105.04319v3 fatcat:tyiiilombrdybi7rkfjicgdscm
« Previous Showing results 1 — 15 out of 573 results