4,191 Hits in 5.2 sec

On the infinite width limit of neural networks with a standard parameterization [article]

Jascha Sohl-Dickstein, Roman Novak, Samuel S. Schoenholz, Jaehoon Lee
2020 arXiv   pre-print
The standard parameterization leads to a divergent neural tangent kernel while the NTK parameterization fails to capture crucial aspects of finite width networks such as: the dependence of training dynamics  ...  There are currently two parameterizations used to derive fixed kernels corresponding to infinite width neural networks, the NTK (Neural Tangent Kernel) parameterization and the naive standard parameterization  ...  This leads to a well-behaved infinite-width limit, but involves a number of inconsistencies relative to standard neural networks. z l+1 = σ √ N l ω l y l + b l (2) The core idea here is to write the width  ... 
arXiv:2001.07301v3 fatcat:tgseieby2rdfxp5ukzu6573cfq

Finite Versus Infinite Neural Networks: an Empirical Study [article]

Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein
2020 arXiv   pre-print
rate break the correspondence between finite and infinite networks; the NTK parameterization outperforms the standard parameterization for finite width networks; diagonal regularization of kernels acts  ...  By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks.  ...  Acknowledgments and Disclosure of Funding We thank Yasaman Bahri and Ethan Dyer for discussions and feedback on the project.  ... 
arXiv:2007.15801v2 fatcat:6ervrlzxybgeteh4cpdytu3w2q

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent [article]

Jaehoon Lee, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, Jeffrey Pennington
2019 arXiv   pre-print
In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order  ...  While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized  ...  We are grateful to Daniel Freeman, Alex Irpan and anonymous reviewers for providing valuable feedbacks on the draft.  ... 
arXiv:1902.06720v4 fatcat:avq7gghjwbgejn4ykozygddjsy

On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization [article]

Wei Huang and Weitao Du and Richard Yi Da Xu
2021 arXiv   pre-print
Through a series of propositions and lemmas, we prove that two NTKs, one corresponding to Gaussian weights and one to orthogonal weights, are equal when the network width is infinite.  ...  In this work, we study the dynamics of ultra-wide networks across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs) with orthogonal initialization  ...  Studies involving NTK commonly adopt the ntk-parameterization [20] , since standard parameterization with an infinite-width limit can lead to divergent gradient flow in the infinite limit.  ... 
arXiv:2004.05867v4 fatcat:rdzo5k6icbagjlw24xo4lixonm

Infinitely Wide Tensor Networks as Gaussian Process [article]

Erdong Guo, David Draper
2021 arXiv   pre-print
It is known that by introducing appropriate prior to the weights of the neural networks, Gaussian Process can be obtained by taking the infinite-width limit of the Bayesian neural networks from a Bayesian  ...  (We note here that Gaussian Process can also be obtained by taking the infinite limit of at least one of the bond dimensions α_i in the product of tensor nodes, and the proofs can be done with the same  ...  Erdong Guo is grateful for the financial support by Ming-Ren Teahouse and Uncertainty Quantification LLC for this work.  ... 
arXiv:2101.02333v1 fatcat:k2lwxtz2xve7jm2kr2coirjquq

Taylorized Training: Towards Better Approximation of Neural Network Training at Finite Width [article]

Yu Bai, Ben Krause, Huan Wang, Caiming Xiong, Richard Socher
2020 arXiv   pre-print
Taylorized training involves training the k-th order Taylor expansion of the neural network at initialization, and is a principled extension of linearized training—a recently proposed theory for understanding  ...  We experiment with Taylorized training on modern neural network architectures, and show that Taylorized training (1) agrees with full neural network training increasingly better as we increase k, and (  ...  It is possible to achieve stronger results with finite-width linearized networks by using the NTK parameterization, which more closely resembles the infinite width limit.  ... 
arXiv:2002.04010v2 fatcat:hsn6b3qoxndj3irwb6wh7puxum

Explaining Neural Scaling Laws [article]

Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, Utkarsh Sharma
2021 arXiv   pre-print
The test loss of well-trained neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network.  ...  The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are  ...  US completed a portion of this work during an internship at Google. JK and US were supported in part by Open Philanthropy.  ... 
arXiv:2102.06701v1 fatcat:v6kwrfyua5g57gajwx3ah7cq5y

Weighted Neural Tangent Kernel: A Generalized and Improved Network-Induced Kernel [article]

Lei Tan, Shutong Wu, Xiaolin Huang
2021 arXiv   pre-print
The Neural Tangent Kernel (NTK) has recently attracted intense study, as it describes the evolution of an over-parameterized Neural Network (NN) trained by gradient descent.  ...  Theoretically, in the infinite-width limit, we prove: i) the stability of the WNTK at initialization and during training, and ii) the equivalence between the WNTK regression estimator and the corresponding  ...  Consider gradient descent of an infinitely wide neural network with initialization θ(0).  ... 
arXiv:2103.11558v1 fatcat:nnw2zkuzf5gjxkskqz56qdlxpy

Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks [article]

Sanjeev Arora, Simon S. Du, Zhiyuan Li, Ruslan Salakhutdinov, Ruosong Wang, Dingli Yu
2019 arXiv   pre-print
Recent research shows that the following two models are equivalent: (a) infinitely wide neural networks (NNs) trained under l2 loss by gradient descent with infinitesimally small learning rate (b) kernel  ...  On a standard testbed of classification/regression tasks from the UCI database, NTK SVM beats the previous gold standard, Random Forests (RF), and also the corresponding finite nets. 2.  ...  The authors would like to thank Amazon Web Services for providing compute time for the experiments in this paper. We thank Priya Goyal for providing experiment details of Goyal et al. (2019) .  ... 
arXiv:1910.01663v3 fatcat:7msqn37qyzehfpc4h3r7izatmy

On Infinite-Width Hypernetworks [article]

Etai Littwin, Tomer Galanti, Lior Wolf, Greg Yang
2021 arXiv   pre-print
As part of this study, we make a mathematical contribution by deriving tight bounds on high order Taylor expansion terms of standard fully connected ReLU networks.  ...  Hypernetworks are architectures that produce the weights of a task-specific primary network.  ...  The contribution of Tomer Galanti is part of Ph.D. thesis research conducted at Tel Aviv University.  ... 
arXiv:2003.12193v7 fatcat:6sqfcokb4baulbe3xgi6dmwj5y

Bayesian neural network priors for edge-preserving inversion [article]

Chen Li, Matthew Dunlop, Georg Stadler
2021 arXiv   pre-print
A class of prior distributions based on the output of neural networks with heavy-tailed weights is introduced, motivated by existing results concerning the infinite-width limit of such networks.  ...  We consider Bayesian inverse problems wherein the unknown state is assumed to be a function with discontinuous structure a priori.  ...  CL would like to acknowledge helpful discussions with Yunan Yang. MD would like to thank Neil Chada and Alex Thiery for helpful discussions.  ... 
arXiv:2112.10663v1 fatcat:nyckhqi3kbgvfanyp3g2yhjqna

Dataset Distillation with Infinitely Wide Convolutional Networks [article]

Timothy Nguyen, Roman Novak, Lechao Xiao, Jaehoon Lee
2022 arXiv   pre-print
To that end, we apply a novel distributed kernel based meta-learning framework to achieve state-of-the-art results for dataset distillation using infinitely wide convolutional neural networks.  ...  For instance, using only 10 datapoints (0.02% of original dataset), we obtain over 65% test accuracy on CIFAR-10 image classification task, a dramatic improvement over the previous best test accuracy of  ...  Schoenholz, who proposed and helped develop the overall strategy for our distributed KIP learning methodology. We are also grateful to Ekin Dogus Cubuk and Manuel Kroiss for helpful discussions.  ... 
arXiv:2107.13034v3 fatcat:dqtki2j5v5bjrmakkyk76n76gu

On the Equivalence between Neural Network and Support Vector Machine [article]

Yilan Chen, Wei Huang, Lam M. Nguyen, Tsui-Wei Weng
2021 arXiv   pre-print
Recent research shows that the dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Neural Tangent Kernel (NTK) .  ...  Under the squared loss, the infinite-width NN trained by gradient descent with an infinitely small learning rate is equivalent to kernel regression with NTK .  ...  We thank the anonymous reviewers for useful suggestions to improve the paper. We thank Libin Zhu for helpful discussions.  ... 
arXiv:2111.06063v1 fatcat:udd3xu6huzavpohrvtkifuw5ni

Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations [article]

Winnie Xu, Ricky T.Q. Chen, Xuechen Li, David Duvenaud
2022 arXiv   pre-print
This approach brings continuous-depth Bayesian neural nets to a competitive comparison against discrete-depth alternatives, while inheriting the memory-efficient training and tunable precision of Neural  ...  We perform scalable approximate inference in continuous-depth Bayesian neural networks.  ...  Neal, and Patrick Kidger for helpful technical discussions and revisions on earlier drafts of this work.  ... 
arXiv:2102.06559v4 fatcat:c2eccg57prcipbt6bhw3rx3yla

Increasing Depth Leads to U-Shaped Test Risk in Over-parameterized Convolutional Networks [article]

Eshaan Nichani, Adityanarayanan Radhakrishnan, Caroline Uhler
2021 arXiv   pre-print
Recent works have demonstrated that increasing model capacity through width in over-parameterized neural networks leads to a decrease in test risk.  ...  In this work, we demonstrate that the test risk of over-parameterized convolutional networks is a U-shaped curve (i.e. monotonically decreasing, then increasing) with increasing depth.  ...  Acknowledgements The authors were supported by the National Science Foundation (DMS-1651995), Office of Naval Research (N00014-17-1-2147 and N00014-18-1-2765), MIT-IBM Watson AI Lab, and a Simons Investigator  ... 
arXiv:2010.09610v2 fatcat:4tylqykcnnh3hbpdbpg3fzi5zy
« Previous Showing results 1 — 15 out of 4,191 results