Filters








59 Hits in 11.2 sec

On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [article]

Itay Safran, Gal Vardi, Jason D. Lee
2022 arXiv   pre-print
We study the dynamics and implicit bias of gradient flow (GF) on univariate ReLU neural networks with a single hidden layer in a binary classification setting.  ...  We show that when the labels are determined by the sign of a target network with r neurons, with high probability over the initialization of the network and the sampling of the dataset, GF converges in  ...  Acknowledgements We thank Noam Razin and Gilad Yahudai for pointing out several relevant papers to discuss in the related work section.  ... 
arXiv:2205.09072v1 fatcat:njby7auyrjcjljujsc7v5jqk4y

Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics

Justin Sahs, Ryan Pyle, Aneel Damaraju, Josue Ortega Caro, Onur Tavaslioglu, Andy Lu, Fabio Anselmi, Ankit B. Patel
2022 Frontiers in Artificial Intelligence  
Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena.  ...  Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented.  ...  In particular, we focus on shallow fully connected univariate ReLU networks, whose parameters will always result in a Continuous Piecewise Linear (CPWL) output.  ... 
doi:10.3389/frai.2022.889981 pmid:35647529 pmcid:PMC9131019 fatcat:enhnaabvtraxdlxd4lfna3cf6e

Piecewise linear neural networks and deep learning

Qinghua Tao, Li Li, Xiaolin Huang, Xiangming Xi, Shuning Wang, Johan A. K. Suykens
2022 Nature Reviews Methods Primers  
In 2010, the Rectified Linear Unit (ReLU) advocated the prevalence of PWLNNs in deep learning.  ...  In this Primer, we systematically introduce the methodology of PWLNNs by grouping the works into shallow and deep networks.  ...  Acknowledgements This work is jointly supported by ERC Advanced Grant E-DUALITY (787960), KU Leuven Grant CoE PFV/10/002, and Grant FWO G0A4917N, EU H2020 ICT-48 Network TAILOR (Foundations of Trustworthy  ... 
doi:10.1038/s43586-022-00125-7 fatcat:zfx7eyld2bghte2rm3tsogm5qq

Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization [article]

Tomaso Poggio, Andrzej Banburski, Qianli Liao
2019 arXiv   pre-print
In approximation theory both shallow and deep networks have been shown to approximate any continuous functions on a bounded domain at the expense of an exponential number of parameters (exponential in  ...  However, for a subset of compositional functions, deep networks of the convolutional type can have a linear dependence on dimensionality, unlike shallow networks.  ...  There are no biases apart form the input layer where the bias is instantiated by one of the input dimensions being a constant. The activation function in this section is the ReLU activation.  ... 
arXiv:1908.09375v1 fatcat:mozi3aotovhmpdd2kzimyi5lbu

The Implicit Bias of Minima Stability: A View from Function Space

Rotem Mulayoff, Tomer Michaeli, Daniel Soudry
2021 Neural Information Processing Systems  
In this paper we study the effect that this mechanism has on the function implemented by the trained model.  ...  We then use our stability results to study a single hidden layer univariate ReLU network.  ...  Acknowledgements The research of Rotem Mulayoff was supported by the Planning and Budgeting Committee the Israeli Council for Higher Education, and by the Andrew and Erna Finci Viterbi Graduate Fellowship  ... 
dblp:conf/nips/MulayoffMS21 fatcat:xb4qlthhh5chdh2dwh2ilwufdi

Shallow Univariate ReLu Networks as Splines: Initialization, Loss Surface, Hessian, Gradient Flow Dynamics [article]

Justin Sahs, Ryan Pyle, Aneel Damaraju, Josue Ortega Caro, Onur Tavaslioglu, Andy Lu, Ankit Patel
2020 arXiv   pre-print
Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented.  ...  Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena.  ...  We focus on the case of a univariate fully connected shallow ReLU network.  ... 
arXiv:2008.01772v1 fatcat:6m77gpamfzgl5is2pj7uxmxypy

Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite Networks

Russell Tsuchida, Tim Pearce, Chris Van der Heide, Fred Roosta, Marcus Gallagher
2021 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Firstly, we derive the covariance functions of multi-layer perceptrons (MLPs) with exponential linear units (ELU) and Gaussian error linear units (GELU) and evaluate the performance of the limiting Gaussian  ...  The fixed point behaviour present in some networks explains a mechanism for implicit regularisation in overparameterised deep models.  ...  Acknowledgements This work was partially funded by CSIRO's Machine Learning and Artificial Intelligence Future Science Plat-  ... 
doi:10.1609/aaai.v35i11.17197 fatcat:xr3qaxd6vnfcda44umw7gxv374

A Fine-Grained Spectral Perspective on Neural Networks [article]

Greg Yang, Hadi Salman
2020 arXiv   pre-print
Are neural networks biased toward simple functions? Does depth always help learn more complex features? Is training the last layer of a network as good as training all layers?  ...  We derive fast algorithms for computing the spectra of CK and NTK when the data is uniformly distributed over the boolean cube, and show this spectra is the same in high dimensions when data is drawn from  ...  A study of their spectra thus informs us of the "implicit prior" of a randomly initialized neural network as well as the "implicit bias" of GD in the context of training neural networks.  ... 
arXiv:1907.10599v4 fatcat:chd252ng6bhqrcfwpeqapb47wu

Avoiding Kernel Fixed Points: Computing with ELU and GELU Infinite Networks [article]

Russell Tsuchida, Tim Pearce, Chris van der Heide, Fred Roosta, Marcus Gallagher
2021 arXiv   pre-print
Firstly, we derive the covariance functions of multi-layer perceptrons (MLPs) with exponential linear units (ELU) and Gaussian error linear units (GELU) and evaluate the performance of the limiting Gaussian  ...  The fixed point behaviour present in some networks explains a mechanism for implicit regularisation in overparameterised deep models.  ...  CvdH was supported by ACEMS under ARC grant number CE140100049. We would like to thank Bob Williamson for a helpful discussion.  ... 
arXiv:2002.08517v3 fatcat:5dleuftuajgezgdy6qrilrxws4

Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint

Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Denny Wu, Tianzong Zhang
2020 International Conference on Learning Representations  
This paper investigates the generalization properties of two-layer neural networks in high-dimensions, i.e. when the number of samples n, features d, and neurons h tend to infinity at the same rate.  ...  In contrast, when the first layer weights are optimized, we highlight how different scales of initialization lead to different inductive bias, and show that the resulting risk is independent of overparameterization  ...  JB and DW were partially funded by LG Electronics and NSERC. JB and MAE were supported by the CIFAR AI Chairs program.  ... 
dblp:conf/iclr/BaESWZ20 fatcat:5iidigooand4lf34vakk754gua

Mean-field Analysis of Piecewise Linear Solutions for Wide ReLU Networks [article]

Alexander Shevchenko, Vyacheslav Kungurtsev, Marco Mondelli
2022 arXiv   pre-print
Our main result is that SGD is biased towards a simple solution: at convergence, the ReLU network implements a piecewise linear map of the inputs, and the number of "knot" points - i.e., points where the  ...  In particular, as the number of neurons of the network grows, the SGD dynamics is captured by the solution of a gradient flow and, at convergence, the distribution of the weights approaches the unique  ...  Acknowledgements We would like to thank Mert Pilanci for several exploratory discussions in the early stage of the project, Jan Maas for clarifications about [JKO98] , and Max Zimmer for suggestive numerical  ... 
arXiv:2111.02278v2 fatcat:6d2crxvc5fdlzie54g2ybdis24

Early-stopped neural networks are consistent [article]

Ziwei Ji, Justin D. Li, Matus Telgarsky
2021 arXiv   pre-print
This work studies the behavior of shallow ReLU networks trained with the logistic loss via gradient descent on binary classification data where the underlying data distribution is general, and the (optimal  ...  In this setting, it is shown that gradient descent with early stopping achieves population risk arbitrarily close to optimal in terms of not just logistic and misclassification losses, but also in terms  ...  Acknowledgments The authors are grateful for support from the NSF under grant IIS-1750051.  ... 
arXiv:2106.05932v2 fatcat:opmyk6skybcnlgbke6lk5hsd5u

Theory of Deep Learning III: explaining the non-overfitting puzzle [article]

Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, Hrushikesh Mhaskar
2018 arXiv   pre-print
Gradient descent enforces a form of implicit regularization controlled by the number of iterations, and asymptotically converges to the minimum norm solution for appropriate initial conditions of gradient  ...  In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to linear  ...  CBMM acknowledges the support of NVIDIA Corporation with the donation of the DGX-1 used in part for this research. HNM is supported in part by ARO Grant W911NF-15-1-0385  ... 
arXiv:1801.00173v2 fatcat:gt6wixwtzbgvpmzokcicbg3koe

Analytical Probability Distributions and EM-Learning for Deep Generative Networks [article]

Randall Balestriero, Sebastien Paris, Richard G. Baraniuk
2020 arXiv   pre-print
In the absence of a known analytical form for the posterior and likelihood expectation, VAEs resort to approximations, including (Amortized) Variational Inference (AVI) and Monte-Carlo (MC) sampling.  ...  Deep Generative Networks (DGNs) with probabilistic modeling of their output and latent space are currently trained via Variational Autoencoders (VAEs).  ...  Note that this result generalizes the result of [41] which related linear and shallow DGNs to PPCA, as in the linear regime one has g(z) = W z + b + .  ... 
arXiv:2006.10023v1 fatcat:2gbjnqfyxfchjktuqbyhewwlaq

The Shallow Gibbs Network, Double Backpropagation and Differential Machine learning

Nonvikan Karl-Augustt Alahassa, Alejandro Murua
2021 Zenodo  
We have built a Shallow Gibbs Network model as a Random Gibbs Network Forest to reach the performance of the Multilayer feedforward Neural Network in a few numbers of parameters, and fewer backpropagation  ...  rate, and which is convergent and universally applicable to any Bayesian neural network problem.  ...  In the Shallow Potts, when you increase the number of base learners (the number neurons one the hidden layer), the train and test errors increase.  ... 
doi:10.5281/zenodo.4683035 fatcat:hnjuqicds5db7mkhwuwjhcjm5e
« Previous Showing results 1 — 15 out of 59 results