26 Hits in 2.8 sec

On the Quality of the Initial Basin in Overspecified Neural Networks [article]

Itay Safran, Ohad Shamir
2016 arXiv   pre-print
In this work, we study the geometric structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters.  ...  Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications.  ...  We thank Lukasz Kaiser for pointing out a bug (as well as the fix) in an earlier version of the paper.  ... 
arXiv:1511.04210v3 fatcat:cerbrbaycbdxflhdo4jknjvw6y

Towards Robust Deep Neural Networks [article]

Timothy E. Wang, Yiming Gu, Dhagash Mehta, Xiaojun Zhao, Edgar A. Bernal
2018 arXiv   pre-print
We investigate the topics of sensitivity and robustness in feedforward and convolutional neural networks.  ...  In our experiments on standard machine learning and computer vision datasets, we show that the proposed loss function leads to networks which reliably optimize the robustness measure as well as other related  ...  Sensitivity In this section, we investigate the quality of the different minima in the optimization landscape of neural networks by leveraging a combination of energy landscape approaches developed by  ... 
arXiv:1810.11726v2 fatcat:u37tjamy2bduzpm7bhagqex2v4

Loss surface of XOR artificial neural networks

Dhagash Mehta, Xiaojun Zhao, Edgar A. Bernal, David J. Wales
2018 Physical review. E  
The number of local minima and transition states (saddle points of index one), as well as the ratio of transition states to minima, grow rapidly with the number of nodes in the network.  ...  Training an artificial neural network involves an optimization process over the landscape defined by the cost (loss) as a function of the network parameters.  ...  quality of the network performance on the training set (i.e., the empirical error).  ... 
doi:10.1103/physreve.97.052307 pmid:29906831 fatcat:jtxgdhxfpnesrna63z6j36kn3a

Gradient Descent Finds Global Minima of Deep Neural Networks [article]

Simon S. Du, Jason D. Lee, Haochuan Li, Liwei Wang, Xiyu Zhai
2019 arXiv   pre-print
Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture.  ...  Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex.  ...  On the quality of the initial basin in overspecified neural networks. In International Conference on Machine Learning, pages 774-782, 2016. Itay Safran and Ohad Shamir.  ... 
arXiv:1811.03804v4 fatcat:xi5pzfzbsvayjlvzvi2gksj22q

Landscape Connectivity and Dropout Stability of SGD Solutions for Over-parameterized Neural Networks [article]

Alexander Shevchenko, Marco Mondelli
2020 arXiv   pre-print
In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately  ...  We show that, if we remove part of the neurons (and suitably rescale the remaining ones), the change in loss is independent of the total number of neurons, and it depends only on how many neurons are left  ...  The authors thank Phan-Minh Nguyen for helpful discussions and the IST Distributed Algorithms and Systems Lab for providing computational resources.  ... 
arXiv:1912.10095v2 fatcat:w5qd2v2tavdftjsx6byuzvzawi

A Recipe for Global Convergence Guarantee in Deep Neural Networks [article]

Kenji Kawaguchi, Qingyun Sun
2021 arXiv   pre-print
Existing global convergence guarantees of (stochastic) gradient descent do not apply to practical deep networks in the practical regime of deep learning beyond the neural tangent kernel (NTK) regime.  ...  On the one hand, the expressivity condition is theoretically proven to hold data-independently for fully-connected deep neural networks with narrow hidden layers and a single wide layer.  ...  Acknowledgments This work is partially supported by the Center of Mathematical Sciences and Applications at Harvard University. The authors thank Yiqiao Zhong, Mengyuan Yan for discussion.  ... 
arXiv:2104.05785v2 fatcat:q3npmkyxgbekzaxgsreyjhk6eq

Deep Unfolding of the DBFB Algorithm with Application to ROI CT Imaging with Limited Angular Density [article]

Marion Savanier, Emilie Chouzenoux, Jean-Christophe Pesquet, Cyril Riddell
2022 arXiv   pre-print
They incorporate the physics of the model and iterative optimization algorithms into a neural network design, leading to superior performance in various applications.  ...  Iterations of a block dual forward-backward (DBFB) algorithm, embedded in an iterative reweighted scheme, are then unrolled over a neural network architecture, allowing the learning of various parameters  ...  On the quality of the initial basin in [24] Y. Yang, J. Sun, H. Li, and Z. Xu. Deep ADMM-Net for Compressive overspecified neural networks.  ... 
arXiv:2209.13264v1 fatcat:bgkkn3l67vgnrmt6ydntax3r3i

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced [article]

Simon S. Du, Wei Hu, Jason D. Lee
2018 arXiv   pre-print
This result implies that if the weights are initially small, gradient flow automatically balances the magnitudes of all layers.  ...  We study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear  ...  Acknowledgements We thank Phil Long for his helpful comments on an earlier draft of this paper. JDL acknowledges support from ARO W911NF-11-1-0303.  ... 
arXiv:1806.00900v2 fatcat:aqaj5tlxuffb3deqballwapvdy

On the loss landscape of a class of deep neural networks with no bad local valleys [article]

Quynh Nguyen, Mahesh Chandra Mukkamala, Matthias Hein
2018 arXiv   pre-print
We identify a class of over-parameterized deep neural networks with standard activation functions and cross-entropy loss which provably have no bad local valley, in the sense that from any point in parameter  ...  space there exists a continuous path on which the cross-entropy loss is non-increasing and gets arbitrarily close to zero.  ...  On the quality of the initial basin in overspecified networks. ICML, 2016. I. Safran and O. Shamir. Spurious local minima are common in two-layer relu neural networks. ICML, 2018. H. Sedghi and A.  ... 
arXiv:1809.10749v2 fatcat:zvv47luo5bfcvajcnvpzyba5pm

Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs [article]

Alon Brutzkus, Amir Globerson
2017 arXiv   pre-print
To the best of our knowledge, this is the first global optimality guarantee of gradient descent on a convolutional neural network with ReLU activations.  ...  The key question is then under what conditions can one prove that optimization will succeed. Here we provide a strong result of this kind.  ...  On the quality of the initial basin in overspecified neural networks. In A Proof of Lemma 3.2 First assume that θ u,v = 0, π .  ... 
arXiv:1702.07966v1 fatcat:tzitjxzgxzfyldwetrqqy73kl4

Spurious Valleys in Two-layer Neural Network Optimization Landscapes [article]

Luca Venturi, Afonso S. Bandeira, Joan Bruna
2020 arXiv   pre-print
Focusing on a class of two-layer neural networks defined by smooth (but generally non-linear) activation functions, we identify a notion of intrinsic dimension and show that it provides necessary and sufficient  ...  In this paper, we address this phenomenon by studying a key topological property of the loss: the presence or absence of spurious valleys, defined as connected components of sub-level sets that do not  ...  LV would also like to thank Jumageldi Charyyev for fruitful discussions on the proofs of several propositions and Andrea Ottolini for valuable comments on a previous version of this manuscript.  ... 
arXiv:1802.06384v4 fatcat:aitb2ohv7neorkgcz5c2fltpoi

Learning Graph Neural Networks with Approximate Gradient Descent [article]

Qunwei Li and Shaofeng Zou and Wenliang Zhong
2020 arXiv   pre-print
The first provably efficient algorithm for learning graph neural networks (GNNs) with one hidden layer for node information convolution is provided in this paper.  ...  For both types of GNNs, sample complexity in terms of the number of nodes or the number of graphs is characterized.  ...  On the quality of the initial basin in overspecified neural networks. In International Conference on Machine Learning, 774-782. Safran, I.; and Shamir, O. 2018.  ... 
arXiv:2012.03429v1 fatcat:ziacokhfyfallg2zzaunk7zb7u

Topology and Geometry of Half-Rectified Network Optimization [article]

C. Daniel Freeman, Joan Bruna
2017 arXiv   pre-print
The loss surface of deep neural networks has recently attracted interest in the optimization and machine learning communities as a prime example of high-dimensional non-convex problem.  ...  We study this question through the geometry of the level sets, and we introduce an algorithm to efficiently estimate the regularity of such sets on large-scale networks.  ...  University of California, Berkeley, 1050, 16. Safran, Itay, & Shamir, Ohad. 2015. On the quality of the initial basin in overspecified neural networks. arXiv preprint arXiv:1511.04210.  ... 
arXiv:1611.01540v4 fatcat:c73wywagmfd5fgfdz237mpeqky

On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them [article]

Chen Liu, Mathieu Salzmann, Tao Lin, Ryota Tomioka, Sabine Süsstrunk
2020 arXiv   pre-print
We analyze the influence of adversarial training on the loss landscape of machine learning models.  ...  much less sensitive to the choice of learning rate.  ...  Overfitting in adversarially robust deep learning. arXiv preprint arXiv:2002.11569, 2020. [38] Itay Safran and Ohad Shamir. On the quality of the initial basin in overspecified neural networks.  ... 
arXiv:2006.08403v2 fatcat:nv3gg4kjmrauhm5wq4lm4u75ce

The Modern Mathematics of Deep Learning [article]

Julius Berner, Philipp Grohs, Gitta Kutyniok, Philipp Petersen
2021 arXiv   pre-print
These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly  ...  fine aspects of an architecture affect the behavior of a learning task in which way.  ...  [SS16] Itay Safran and Ohad Shamir, On the quality of the initial basin in overspecified neural networks, International Conference on Machine Learning, 2016, pp. 774–782.  ... 
arXiv:2105.04026v1 fatcat:lxnfyzr6qfasneo433inpgseia
« Previous Showing results 1 — 15 out of 26 results