6,963 Hits in 3.5 sec

Finite Versus Infinite Neural Networks: an Empirical Study [article]

Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein
2020 arXiv   pre-print
We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods.  ...  By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks.  ...  acknowledge the Python community [127] for developing the core set of tools that enabled this work, including NumPy [128] , SciPy [129] , Matplotlib [130] , Pandas [131] , Jupyter [132] , JAX [133] , Neural  ... 
arXiv:2007.15801v2 fatcat:6ervrlzxybgeteh4cpdytu3w2q

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel [article]

Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli
2020 arXiv   pre-print
In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight  ...  We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK.  ...  The neural tangent kernel (NTK) has garnered much attention as it provides a theoretical foothold to understand deep networks, at least in an infinite width limit with appropriate initialization scale  ... 
arXiv:2010.15110v1 fatcat:cgusggzoe5ch3dg3dqnfz7224q

An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare [article]

Taylor W. Killian, Haoran Zhang, Jayakumar Subramanian, Mehdi Fatemi, Marzyeh Ghassemi
2020 arXiv   pre-print
In this paper, we perform an empirical study of several information encoding architectures using data from septic patients in the MIMIC-III dataset to form representations of a patient state.  ...  To date, how best to construct such states in a healthcare setting is an open question.  ...  Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.  ... 
arXiv:2011.11235v1 fatcat:4ce2gp2m5rbavfbdbnqo54wxra

Information in Infinite Ensembles of Infinitely-Wide Neural Networks [article]

Ravid Shwartz-Ziv, Alexander A. Alemi
2019 arXiv   pre-print
In this preliminary work, we study the generalization properties of infinite ensembles of infinitely-wide neural networks.  ...  We report analytical and empirical investigations in the search for signals that correlate with generalization.  ...  First, we emphasize the somewhat surprising result that, as time goes to infinity, the MI between an infinite ensemble of infinitely-wide neural networks output and their input is finite and quite small  ... 
arXiv:1911.09189v2 fatcat:x2o3illcg5fghkqpqodeuf4or4

Double-descent curves in neural networks: a new perspective using Gaussian processes [article]

Ouns El Harzli, Guillermo Valle-Pérez, Ard A. Louis
2022 arXiv   pre-print
Here we use a neural network Gaussian process (NNGP) which maps exactly to a fully connected network (FCN) in the infinite-width limit, combined with techniques from random matrix theory, to calculate  ...  Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which  ...  The infinite-width limit of ( 5 ) is a Gaussian process, called neural network Gaussian process (NNGP), with respect to the input space, i.e. any collection of finitely many realisations of the process  ... 
arXiv:2102.07238v4 fatcat:a5n5dg2hzncwnjbguzfjttv35a

The costs of free entry: an empirical study of real estate agents in Greater Boston

Panle Jia Barwick, Parag A. Pathak
2015 The Rand Journal of Economics  
We develop a dynamic empirical model motivated by these patterns to study the extent of inefficiency in the current system compared to alternatives.  ...  To accommodate a large state space, we approximate the value function using sieves and impose the Bellman equation as an equilibrium constraint.  ...  infinite horizon is an important and difficult question.  ... 
doi:10.1111/1756-2171.12082 fatcat:pzbxzwkjgnem7hab4wmurghnpa

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent [article]

Jaehoon Lee, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, Jeffrey Pennington
2019 arXiv   pre-print
version even for finite practically-sized networks.  ...  While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized  ...  Sensitivity and generalization in neural networks: an empirical study. In International Conference on Learning Representations, 2018.  ... 
arXiv:1902.06720v4 fatcat:avq7gghjwbgejn4ykozygddjsy

Fast Adaptation with Linearized Neural Networks [article]

Wesley J. Maddox, Shuai Tang, Pablo Garcia Moreno, Andrew Gordon Wilson, Andreas Damianou
2021 arXiv   pre-print
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions.  ...  The inductive biases of trained neural networks are difficult to understand and, consequently, to adapt to new settings.  ...  Thus, our procedure only requires a pretrained neural network on an initial task.  ... 
arXiv:2103.01439v2 fatcat:cazvznfrufgnlmvm56z5bgwwbu

Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization [article]

Francis Bach
2021 arXiv   pre-print
Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain.  ...  In this review paper, we consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may  ...  Mean field limit of overparameterized one-hidden layer neural networks We now tackle the study of neural networks with one infinitely wide hidden layer.  ... 
arXiv:2110.08084v1 fatcat:stye5jkm5fhyjiclvmz6olxtly

Is the brain really a small-world network?

Claus C. Hilgetag, Alexandros Goulas
2015 Brain Structure and Function  
This means that the number of accessible nodes grows exponentially with the distance of steps from an initial node, formally corresponding to an infinite topological dimension (while ignoring finite-size  ...  Fig. 1 1 Classical small-world network (a) versus hierarchical modular network (b)  ... 
doi:10.1007/s00429-015-1035-6 pmid:25894630 pmcid:PMC4853440 fatcat:4pdkrnnazbeyvlxjsnc5ptblue

Dataset Distillation with Infinitely Wide Convolutional Networks [article]

Timothy Nguyen, Roman Novak, Lechao Xiao, Jaehoon Lee
2022 arXiv   pre-print
To that end, we apply a novel distributed kernel based meta-learning framework to achieve state-of-the-art results for dataset distillation using infinitely wide convolutional neural networks.  ...  Neural Network Transfer In this section, we study how our distilled datasets optimized using KIP and LS transfer to the setting of finite-width neural networks.  ...  We also observe that as predicted by infinite-width theory [Jacot et al., 2018 , the overall gap between KIP or LS performance and finite-width neural network decreases as the width increases.  ... 
arXiv:2107.13034v3 fatcat:dqtki2j5v5bjrmakkyk76n76gu

Information Flow in Deep Neural Networks [article]

Ravid Shwartz-Ziv
2022 arXiv   pre-print
In our study, we obtained tractable computations of many information-theoretic quantities and their bounds for infinite ensembles of infinitely wide neural networks.  ...  An analytical framework reveals the underlying structure and optimal representations, and a variational framework using deep neural network optimization validates the results.  ...  An incredible scholar and a lovely person.  ... 
arXiv:2202.06749v2 fatcat:eo3pcousavg3zp5xza57kejjq4

The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective [article]

Geoff Pleiss, John P. Cunningham
2021 arXiv   pre-print
These results make strong predictions about the same phenomenon in conventional neural networks trained with L2 regularization (analogous to a Gaussian prior on parameters): we show that such neural networks  ...  Our analysis in this paper decouples capacity and width via the generalization of neural networks to Deep Gaussian Processes (Deep GP), a class of nonparametric hierarchical models that subsume neural  ...  Neural networks versus Deep GP.  ... 
arXiv:2106.06529v2 fatcat:my7nbo52yzgp5h2fkg76hlvcje

On Sparsity in Overparametrised Shallow ReLU Networks [article]

Jaume de Dios, Joan Bruna
2020 arXiv   pre-print
The limit of infinitely wide networks provides an appealing route forward through the mean-field perspective, but a key challenge is to bring learning guarantees back to the finite-neuron setting, where  ...  Towards closing this gap, and focusing on shallow neural networks, in this work we study the ability of different regularisation strategies to capture solutions requiring only a finite amount of neurons  ...  Training Overparametrised Neural Networks and Wasserstein Gradient Flows Notice that for empirical measures µ (m) corresponding to a m-width shallow network, the loss L(µ (m) ) is precisely the loss L(  ... 
arXiv:2006.10225v1 fatcat:2rzbdrfyz5e7jmv367qwudfy2q

Neural Operator: Graph Kernel Network for Partial Differential Equations [article]

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, Anima Anandkumar
2020 arXiv   pre-print
The purpose of this work is to generalize neural networks so that they can learn mappings between infinite-dimensional spaces (operators).  ...  The classical development of neural networks has been primarily for mappings between a finite-dimensional Euclidean space and a set of classes, or between two finite-dimensional Euclidean spaces.  ...  Such an approach closely resembles classical methods such as finite elements, replacing the linear span of a finite set of local basis functions with the space of neural networks.  ... 
arXiv:2003.03485v1 fatcat:yeqofzrn5redrluzufharl3xly
« Previous Showing results 1 — 15 out of 6,963 results