A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Capacity Control of ReLU Neural Networks by Basis-path Norm
[article]
2018
arXiv
pre-print
Motivated by this, we propose a new norm Basis-path Norm based on a group of linearly independent paths to measure the capacity of neural networks more accurately. ...
Recently, path norm was proposed as a new capacity measure for neural networks with Rectified Linear Unit (ReLU) activation function, which takes the rescaling-invariant property of ReLU into account. ...
Conclusion In this paper, we define Basis-path norm on the group of basis paths, and prove that the generalization error of ReLU neural networks can be upper bounded by a function of BP norm. ...
arXiv:1809.07122v1
fatcat:xiq4uvsatnbuld7677uzb7e5dy
Capacity Control of ReLU Neural Networks by Basis-Path Norm
2019
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Motivated by this, we propose a new norm Basis-path Norm based on a group of linearly independent paths to measure the capacity of neural networks more accurately. ...
Recently, path norm was proposed as a new capacity measure for neural networks with Rectified Linear Unit (ReLU) activation function, which takes the rescaling-invariant property of ReLU into account. ...
Acknowledgments This work was partially supported by the National Natural Science Foundation of China under Grant U1636201. We would like to show our gratitude to Prof. ...
doi:10.1609/aaai.v33i01.33015925
fatcat:for3dmin4vhctoeqtfsnlpx56m
What Kinds of Functions do Deep Neural Networks Learn? Insights from Variational Spline Theory
[article]
2021
arXiv
pre-print
The variational problem we study can be recast as a finite-dimensional neural network training problem with regularization schemes related to the notions of weight decay and path-norm regularization. ...
These are Banach spaces with sparsity-promoting norms, giving insight into the role of sparsity in deep neural networks. ...
We also remark that the work in [7] show that the path-norm in (4.8) controls the Rademacher and Gaussian complexity of deep ReLU networks.
Conclusion. ...
arXiv:2105.03361v3
fatcat:6mdcw2ggsfflhefw5caqakpncu
Implicit Regularization in Deep Learning
[article]
2017
arXiv
pre-print
We further study the invariances in neural networks, suggest complexity measures and optimization algorithms that have similar invariances to those in neural networks and evaluate them on a number of learning ...
We show that implicit regularization induced by the optimization method is playing a key role in generalization and success of deep learning models. ...
What is the bias introduced by these algorithmic choices for neural networks? What is the relevant notion of complexity or capacity control? ...
arXiv:1709.01953v2
fatcat:o3xzvsq2dfaoxceks5bsx6lcs4
Positively Scale-Invariant Flatness of ReLU Neural Networks
[article]
2019
arXiv
pre-print
Values of basis paths have been shown to be the PSI-variables and can sufficiently represent the ReLU neural networks which ensure the PSI property of PSI-flatness. ...
of ReLU network. ...
Definition 4. 1 ( 1 PSI-flatness) Representing ReLU NN by values of basis paths as
Assumption 5. 1 1 The L 2 norm of the input of every layer can be upper bounded by a constant C.Assumption 5.2 The loss ...
arXiv:1903.02237v1
fatcat:7ok3ds7lczad7h5bkm4kinbr3e
Sobolev training of thermodynamic-informed neural networks for smoothed elasto-plasticity models with level set hardening
[article]
2020
arXiv
pre-print
Our numerical experiments reveal that this new approach provides more robust and accurate forward predictions of cyclic stress paths than these obtained from black-box deep neural network models such as ...
deep neural network predictions. ...
E Acknowledgments The authors are supported by by the NSF CAREER grant from Mechanics of Materials and Structures program at National Science Foundation under grant contracts CMMI-1846875 and OAC-1940203 ...
arXiv:2010.11265v1
fatcat:tdqkvjyutnd7rdiworaqn6z724
Neural Radiosity
[article]
2021
arXiv
pre-print
We introduce Neural Radiosity, an algorithm to solve the rendering equation by minimizing the norm of its residual similar as in traditional radiosity techniques. ...
Instead, we propose to leverage neural networks to represent the full four-dimensional radiance distribution, directly optimizing network parameters to minimize the norm of the residual. ...
If the network capacity is unlimited, it is guaranteed to converge to the exact solution where the residual norm vanishes. ...
arXiv:2105.12319v1
fatcat:6nr62aopaff6tdtbmhw3lp36vm
Regularizing activations in neural networks via distribution matching with the Wasserstein metric
[article]
2020
arXiv
pre-print
Regularization and normalization have become indispensable components in training deep neural networks, resulting in faster training and improved generalization performance. ...
By doing so, PER minimizes the upper bound of the Wasserstein distance of order one between an empirical distribution of activations and the standard normal distribution. ...
Among various techniques of controlling activations, one well-known and successful path is controlling their first and second moments. ...
arXiv:2002.05366v2
fatcat:c5iij45n4bg2fob6beffmu475i
The Role of Linear Layers in Nonlinear Interpolating Networks
[article]
2022
arXiv
pre-print
The representation cost of a function induced by a neural network architecture is the minimum sum of squared weights needed for the network to represent the function; it reflects the function space bias ...
This paper explores the implicit bias of overparameterized neural networks of depth greater than two layers. ...
L = 2 (i.e., a single hidden-layer ReLU network with no additional linear layers), we have Φ 2 (W , a) = K k=1 |a k | w k 2 . ( 13 ) This has been referred to as the "path norm" by Neyshabur et al. ( ...
arXiv:2202.00856v1
fatcat:jy73l6zsqfdhjm6zu4dyynnr7y
Towards Non-Saturating Recurrent Units for Modelling Long-Term Dependencies
2019
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
Modelling long-term dependencies is a challenge for recurrent neural networks. This is primarily due to the fact that gradients vanish during training, as the sequence length increases. ...
compared against a range of other architectures. ...
Introduction Vanishing and exploding gradients remain a core challenge in the training of recurrent neural networks. ...
doi:10.1609/aaai.v33i01.33013280
fatcat:nc3rcfbhknhvbc7iom5hpwswty
Sample Compression, Support Vectors, and Generalization in Deep Learning
[article]
2020
arXiv
pre-print
Then, using a max-margin assumption, the paper develops a sample compression representation of the neural network in terms of the discrete activation state of neurons induced by s "support vectors". ...
The paper shows that the number of support vectors s relates with learning guarantees for neural networks through sample compression bounds, yielding a sample complexity of O(ns/epsilon) for networks with ...
For example, [16] analyzes the number of nonzero weights as a form of capacity control, while others have studied approximating a deep network by a "compressed" version with fewer nonzero weights [17 ...
arXiv:1811.02067v4
fatcat:5irjhduq6zew3ocg36zz3zrmgm
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks
[article]
2021
arXiv
pre-print
We study how neural networks trained by gradient descent extrapolate, i.e., what they learn outside the support of the training distribution. ...
Our theoretical analysis builds on a connection of over-parameterized networks to the neural tangent kernel. Empirically, our theory holds across different training settings. ...
This research was supported by NSF CAREER award 1553284, NSF III 1900933, and a Chevron-MIT Energy Fellowship. This research was also supported by JST ERATO JPMJER1201 and JSPS Kakenhi JP18H05291. ...
arXiv:2009.11848v5
fatcat:spopbpkhwfcelksb4qf264htfa
Towards Non-saturating Recurrent Units for Modelling Long-term Dependencies
[article]
2019
arXiv
pre-print
Modelling long-term dependencies is a challenge for recurrent neural networks. This is primarily due to the fact that gradients vanish during training, as the sequence length increases. ...
compared against a range of other architectures. ...
Introduction Vanishing and exploding gradients remain a core challenge in the training of recurrent neural networks. ...
arXiv:1902.06704v1
fatcat:z5mgipefpjeyhmfnyecbrgpqbu
Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth
2021
Neural Computation
This network is built with Floor ([Formula: see text]) or ReLU ([Formula: see text]) activation function in each neuron; hence, we call such networks Floor-ReLU networks. ...
As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of [Formula: see text] as [Formula: see text] is moderate (e.g., [Formula: see ...
The third and fourth terms of equation 2.2 are usually bounded in terms of the sample size n and a certain norm of θ N and θ D (e.g., 1 , 2 , or the path norm), respectively. ...
doi:10.1162/neco_a_01364
pmid:33513325
fatcat:anugapaxrnbvpp5pgnpvpzmo6m
A Correspondence between Normalization Strategies in Artificial and Biological Neural Networks
2021
Neural Computation
As a proof of concept, we develop an algorithm, inspired by a neural normalization technique called synaptic scaling, and show that this algorithm performs competitively against existing normalization ...
A fundamental challenge at the interface of machine learning and neuroscience is to uncover computational principles that are shared between artificial and biological neural networks. ...
S.N. was supported by the Pew Charitable Trusts, the National Institutes of Health under awards 1R01DC017695 and 1UF1NS111692, and funding from the Simons Center for Quantitative Biology at Cold Spring ...
doi:10.1162/neco_a_01439
pmid:34474484
pmcid:PMC8662716
fatcat:x5tsd6y3ijhtripppasnaa6eze
« Previous
Showing results 1 — 15 out of 860 results