Filters








860 Hits in 3.5 sec

Capacity Control of ReLU Neural Networks by Basis-path Norm [article]

Shuxin Zheng, Qi Meng, Huishuai Zhang, Wei Chen, Nenghai Yu, Tie-Yan Liu
2018 arXiv   pre-print
Motivated by this, we propose a new norm Basis-path Norm based on a group of linearly independent paths to measure the capacity of neural networks more accurately.  ...  Recently, path norm was proposed as a new capacity measure for neural networks with Rectified Linear Unit (ReLU) activation function, which takes the rescaling-invariant property of ReLU into account.  ...  Conclusion In this paper, we define Basis-path norm on the group of basis paths, and prove that the generalization error of ReLU neural networks can be upper bounded by a function of BP norm.  ... 
arXiv:1809.07122v1 fatcat:xiq4uvsatnbuld7677uzb7e5dy

Capacity Control of ReLU Neural Networks by Basis-Path Norm

Shuxin Zheng, Qi Meng, Huishuai Zhang, Wei Chen, Nenghai Yu, Tie-Yan Liu
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Motivated by this, we propose a new norm Basis-path Norm based on a group of linearly independent paths to measure the capacity of neural networks more accurately.  ...  Recently, path norm was proposed as a new capacity measure for neural networks with Rectified Linear Unit (ReLU) activation function, which takes the rescaling-invariant property of ReLU into account.  ...  Acknowledgments This work was partially supported by the National Natural Science Foundation of China under Grant U1636201. We would like to show our gratitude to Prof.  ... 
doi:10.1609/aaai.v33i01.33015925 fatcat:for3dmin4vhctoeqtfsnlpx56m

What Kinds of Functions do Deep Neural Networks Learn? Insights from Variational Spline Theory [article]

Rahul Parhi, Robert D. Nowak
2021 arXiv   pre-print
The variational problem we study can be recast as a finite-dimensional neural network training problem with regularization schemes related to the notions of weight decay and path-norm regularization.  ...  These are Banach spaces with sparsity-promoting norms, giving insight into the role of sparsity in deep neural networks.  ...  We also remark that the work in [7] show that the path-norm in (4.8) controls the Rademacher and Gaussian complexity of deep ReLU networks. Conclusion.  ... 
arXiv:2105.03361v3 fatcat:6mdcw2ggsfflhefw5caqakpncu

Implicit Regularization in Deep Learning [article]

Behnam Neyshabur
2017 arXiv   pre-print
We further study the invariances in neural networks, suggest complexity measures and optimization algorithms that have similar invariances to those in neural networks and evaluate them on a number of learning  ...  We show that implicit regularization induced by the optimization method is playing a key role in generalization and success of deep learning models.  ...  What is the bias introduced by these algorithmic choices for neural networks? What is the relevant notion of complexity or capacity control?  ... 
arXiv:1709.01953v2 fatcat:o3xzvsq2dfaoxceks5bsx6lcs4

Positively Scale-Invariant Flatness of ReLU Neural Networks [article]

Mingyang Yi, Qi Meng, Wei Chen, Zhi-ming Ma, Tie-Yan Liu
2019 arXiv   pre-print
Values of basis paths have been shown to be the PSI-variables and can sufficiently represent the ReLU neural networks which ensure the PSI property of PSI-flatness.  ...  of ReLU network.  ...  Definition 4. 1 ( 1 PSI-flatness) Representing ReLU NN by values of basis paths as Assumption 5. 1 1 The L 2 norm of the input of every layer can be upper bounded by a constant C.Assumption 5.2 The loss  ... 
arXiv:1903.02237v1 fatcat:7ok3ds7lczad7h5bkm4kinbr3e

Sobolev training of thermodynamic-informed neural networks for smoothed elasto-plasticity models with level set hardening [article]

Nikolaos N. Vlassis, WaiChing Sun
2020 arXiv   pre-print
Our numerical experiments reveal that this new approach provides more robust and accurate forward predictions of cyclic stress paths than these obtained from black-box deep neural network models such as  ...  deep neural network predictions.  ...  E Acknowledgments The authors are supported by by the NSF CAREER grant from Mechanics of Materials and Structures program at National Science Foundation under grant contracts CMMI-1846875 and OAC-1940203  ... 
arXiv:2010.11265v1 fatcat:tdqkvjyutnd7rdiworaqn6z724

Neural Radiosity [article]

Saeed Hadadan, Shuhong Chen, Matthias Zwicker
2021 arXiv   pre-print
We introduce Neural Radiosity, an algorithm to solve the rendering equation by minimizing the norm of its residual similar as in traditional radiosity techniques.  ...  Instead, we propose to leverage neural networks to represent the full four-dimensional radiance distribution, directly optimizing network parameters to minimize the norm of the residual.  ...  If the network capacity is unlimited, it is guaranteed to converge to the exact solution where the residual norm vanishes.  ... 
arXiv:2105.12319v1 fatcat:6nr62aopaff6tdtbmhw3lp36vm

Regularizing activations in neural networks via distribution matching with the Wasserstein metric [article]

Taejong Joo, Donggu Kang, Byunghoon Kim
2020 arXiv   pre-print
Regularization and normalization have become indispensable components in training deep neural networks, resulting in faster training and improved generalization performance.  ...  By doing so, PER minimizes the upper bound of the Wasserstein distance of order one between an empirical distribution of activations and the standard normal distribution.  ...  Among various techniques of controlling activations, one well-known and successful path is controlling their first and second moments.  ... 
arXiv:2002.05366v2 fatcat:c5iij45n4bg2fob6beffmu475i

The Role of Linear Layers in Nonlinear Interpolating Networks [article]

Greg Ongie, Rebecca Willett
2022 arXiv   pre-print
The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function; it reflects the function space bias  ...  This paper explores the implicit bias of overparameterized neural networks of depth greater than two layers.  ...  L = 2 (i.e., a single hidden-layer ReLU network with no additional linear layers), we have Φ 2 (W , a) = K k=1 |a k | w k 2 . ( 13 ) This has been referred to as the "path norm" by Neyshabur et al. (  ... 
arXiv:2202.00856v1 fatcat:jy73l6zsqfdhjm6zu4dyynnr7y

Towards Non-Saturating Recurrent Units for Modelling Long-Term Dependencies

Sarath Chandar, Chinnadhurai Sankar, Eugene Vorontsov, Samira Ebrahimi Kahou, Yoshua Bengio
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Modelling long-term dependencies is a challenge for recurrent neural networks. This is primarily due to the fact that gradients vanish during training, as the sequence length increases.  ...  compared against a range of other architectures.  ...  Introduction Vanishing and exploding gradients remain a core challenge in the training of recurrent neural networks.  ... 
doi:10.1609/aaai.v33i01.33013280 fatcat:nc3rcfbhknhvbc7iom5hpwswty

Sample Compression, Support Vectors, and Generalization in Deep Learning [article]

Christopher Snyder, Sriram Vishwanath
2020 arXiv   pre-print
Then, using a max-margin assumption, the paper develops a sample compression representation of the neural network in terms of the discrete activation state of neurons induced by s "support vectors".  ...  The paper shows that the number of support vectors s relates with learning guarantees for neural networks through sample compression bounds, yielding a sample complexity of O(ns/epsilon) for networks with  ...  For example, [16] analyzes the number of nonzero weights as a form of capacity control, while others have studied approximating a deep network by a "compressed" version with fewer nonzero weights [17  ... 
arXiv:1811.02067v4 fatcat:5irjhduq6zew3ocg36zz3zrmgm

How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks [article]

Keyulu Xu, Mozhi Zhang, Jingling Li, Simon S. Du, Ken-ichi Kawarabayashi, Stefanie Jegelka
2021 arXiv   pre-print
We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution.  ...  Our theoretical analysis builds on a connection of over-parameterized networks to the neural tangent kernel. Empirically, our theory holds across different training settings.  ...  This research was supported by NSF CAREER award 1553284, NSF III 1900933, and a Chevron-MIT Energy Fellowship. This research was also supported by JST ERATO JPMJER1201 and JSPS Kakenhi JP18H05291.  ... 
arXiv:2009.11848v5 fatcat:spopbpkhwfcelksb4qf264htfa

Towards Non-saturating Recurrent Units for Modelling Long-term Dependencies [article]

Sarath Chandar, Chinnadhurai Sankar, Eugene Vorontsov, Samira Ebrahimi Kahou, Yoshua Bengio
2019 arXiv   pre-print
Modelling long-term dependencies is a challenge for recurrent neural networks. This is primarily due to the fact that gradients vanish during training, as the sequence length increases.  ...  compared against a range of other architectures.  ...  Introduction Vanishing and exploding gradients remain a core challenge in the training of recurrent neural networks.  ... 
arXiv:1902.06704v1 fatcat:z5mgipefpjeyhmfnyecbrgpqbu

Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth

Zuowei Shen, Haizhao Yang, Shijun Zhang
2021 Neural Computation  
This network is built with Floor ([Formula: see text]) or ReLU ([Formula: see text]) activation function in each neuron; hence, we call such networks Floor-ReLU networks.  ...  As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of [Formula: see text] as [Formula: see text] is moderate (e.g., [Formula: see  ...  The third and fourth terms of equation 2.2 are usually bounded in terms of the sample size n and a certain norm of θ N and θ D (e.g., 1 , 2 , or the path norm), respectively.  ... 
doi:10.1162/neco_a_01364 pmid:33513325 fatcat:anugapaxrnbvpp5pgnpvpzmo6m

A Correspondence between Normalization Strategies in Artificial and Biological Neural Networks

Yang Shen, Julia Wang, Saket Navlakha
2021 Neural Computation  
As a proof of concept, we develop an algorithm, inspired by a neural normalization technique called synaptic scaling, and show that this algorithm performs competitively against existing normalization  ...  A fundamental challenge at the interface of machine learning and neuroscience is to uncover computational principles that are shared between artificial and biological neural networks.  ...  S.N. was supported by the Pew Charitable Trusts, the National Institutes of Health under awards 1R01DC017695 and 1UF1NS111692, and funding from the Simons Center for Quantitative Biology at Cold Spring  ... 
doi:10.1162/neco_a_01439 pmid:34474484 pmcid:PMC8662716 fatcat:x5tsd6y3ijhtripppasnaa6eze
« Previous Showing results 1 — 15 out of 860 results