A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Finite Versus Infinite Neural Networks: an Empirical Study
[article]
2020
arXiv
pre-print
We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods. ...
By doing so, we resolve a variety of open questions related to the study of infinitely wide neural networks. ...
acknowledge the Python community [127] for developing the core set of tools that enabled this work, including NumPy [128] , SciPy [129] , Matplotlib [130] , Pandas [131] , Jupyter [132] , JAX [133] , Neural ...
arXiv:2007.15801v2
fatcat:6ervrlzxybgeteh4cpdytu3w2q
Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel
[article]
2020
arXiv
pre-print
In suitably initialized wide networks, small learning rates transform deep neural networks (DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well-approximated by a linear weight ...
We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK. ...
The neural tangent kernel (NTK) has garnered much attention as it provides a theoretical foothold to understand deep networks, at least in an infinite width limit with appropriate initialization scale ...
arXiv:2010.15110v1
fatcat:cgusggzoe5ch3dg3dqnfz7224q
An Empirical Study of Representation Learning for Reinforcement Learning in Healthcare
[article]
2020
arXiv
pre-print
In this paper, we perform an empirical study of several information encoding architectures using data from septic patients in the MIMIC-III dataset to form representations of a patient state. ...
To date, how best to construct such states in a healthcare setting is an open question. ...
Empirical evaluation
of gated recurrent neural networks on sequence
modeling. arXiv preprint arXiv:1412.3555,
2014. ...
arXiv:2011.11235v1
fatcat:4ce2gp2m5rbavfbdbnqo54wxra
Information in Infinite Ensembles of Infinitely-Wide Neural Networks
[article]
2019
arXiv
pre-print
In this preliminary work, we study the generalization properties of infinite ensembles of infinitely-wide neural networks. ...
We report analytical and empirical investigations in the search for signals that correlate with generalization. ...
First, we emphasize the somewhat surprising result that, as time goes to infinity, the MI between an infinite ensemble of infinitely-wide neural networks output and their input is finite and quite small ...
arXiv:1911.09189v2
fatcat:x2o3illcg5fghkqpqodeuf4or4
Double-descent curves in neural networks: a new perspective using Gaussian processes
[article]
2022
arXiv
pre-print
Here we use a neural network Gaussian process (NNGP) which maps exactly to a fully connected network (FCN) in the infinite-width limit, combined with techniques from random matrix theory, to calculate ...
Double-descent curves in neural networks describe the phenomenon that the generalisation error initially descends with increasing parameters, then grows after reaching an optimal number of parameters which ...
The infinite-width limit of ( 5 ) is a Gaussian process, called neural network Gaussian process (NNGP), with respect to the input space, i.e. any collection of finitely many realisations of the process ...
arXiv:2102.07238v4
fatcat:a5n5dg2hzncwnjbguzfjttv35a
The costs of free entry: an empirical study of real estate agents in Greater Boston
2015
The Rand Journal of Economics
We develop a dynamic empirical model motivated by these patterns to study the extent of inefficiency in the current system compared to alternatives. ...
To accommodate a large state space, we approximate the value function using sieves and impose the Bellman equation as an equilibrium constraint. ...
infinite horizon is an important and difficult question. ...
doi:10.1111/1756-2171.12082
fatcat:pzbxzwkjgnem7hab4wmurghnpa
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
[article]
2019
arXiv
pre-print
version even for finite practically-sized networks. ...
While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized ...
Sensitivity and generalization in neural networks: an empirical study. In International
Conference on Learning Representations, 2018. ...
arXiv:1902.06720v4
fatcat:avq7gghjwbgejn4ykozygddjsy
Fast Adaptation with Linearized Neural Networks
[article]
2021
arXiv
pre-print
We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions. ...
The inductive biases of trained neural networks are difficult to understand and, consequently, to adapt to new settings. ...
Thus, our procedure only requires a pretrained neural network on an initial task. ...
arXiv:2103.01439v2
fatcat:cazvznfrufgnlmvm56z5bgwwbu
Gradient Descent on Infinitely Wide Neural Networks: Global Convergence and Generalization
[article]
2021
arXiv
pre-print
Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. ...
In this review paper, we consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may ...
Mean field limit of overparameterized one-hidden layer neural networks We now tackle the study of neural networks with one infinitely wide hidden layer. ...
arXiv:2110.08084v1
fatcat:stye5jkm5fhyjiclvmz6olxtly
Is the brain really a small-world network?
2015
Brain Structure and Function
This means that the number of accessible nodes grows exponentially with the distance of steps from an initial node, formally corresponding to an infinite topological dimension (while ignoring finite-size ...
Fig. 1 1 Classical small-world network (a) versus hierarchical modular network (b) ...
doi:10.1007/s00429-015-1035-6
pmid:25894630
pmcid:PMC4853440
fatcat:4pdkrnnazbeyvlxjsnc5ptblue
Dataset Distillation with Infinitely Wide Convolutional Networks
[article]
2022
arXiv
pre-print
To that end, we apply a novel distributed kernel based meta-learning framework to achieve state-of-the-art results for dataset distillation using infinitely wide convolutional neural networks. ...
Neural Network Transfer In this section, we study how our distilled datasets optimized using KIP and LS transfer to the setting of finite-width neural networks. ...
We also observe that as predicted by infinite-width theory [Jacot et al., 2018 , the overall gap between KIP or LS performance and finite-width neural network decreases as the width increases. ...
arXiv:2107.13034v3
fatcat:dqtki2j5v5bjrmakkyk76n76gu
Information Flow in Deep Neural Networks
[article]
2022
arXiv
pre-print
In our study, we obtained tractable computations of many information-theoretic quantities and their bounds for infinite ensembles of infinitely wide neural networks. ...
An analytical framework reveals the underlying structure and optimal representations, and a variational framework using deep neural network optimization validates the results. ...
An incredible scholar and a lovely person. ...
arXiv:2202.06749v2
fatcat:eo3pcousavg3zp5xza57kejjq4
The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective
[article]
2021
arXiv
pre-print
These results make strong predictions about the same phenomenon in conventional neural networks trained with L2 regularization (analogous to a Gaussian prior on parameters): we show that such neural networks ...
Our analysis in this paper decouples capacity and width via the generalization of neural networks to Deep Gaussian Processes (Deep GP), a class of nonparametric hierarchical models that subsume neural ...
Neural networks versus Deep GP. ...
arXiv:2106.06529v2
fatcat:my7nbo52yzgp5h2fkg76hlvcje
On Sparsity in Overparametrised Shallow ReLU Networks
[article]
2020
arXiv
pre-print
The limit of infinitely wide networks provides an appealing route forward through the mean-field perspective, but a key challenge is to bring learning guarantees back to the finite-neuron setting, where ...
Towards closing this gap, and focusing on shallow neural networks, in this work we study the ability of different regularisation strategies to capture solutions requiring only a finite amount of neurons ...
Training Overparametrised Neural Networks and Wasserstein Gradient Flows Notice that for empirical measures µ (m) corresponding to a m-width shallow network, the loss L(µ (m) ) is precisely the loss L( ...
arXiv:2006.10225v1
fatcat:2rzbdrfyz5e7jmv367qwudfy2q
Neural Operator: Graph Kernel Network for Partial Differential Equations
[article]
2020
arXiv
pre-print
The purpose of this work is to generalize neural networks so that they can learn mappings between infinite-dimensional spaces (operators). ...
The classical development of neural networks has been primarily for mappings between a finite-dimensional Euclidean space and a set of classes, or between two finite-dimensional Euclidean spaces. ...
Such an approach closely resembles classical methods such as finite elements, replacing the linear span of a finite set of local basis functions with the space of neural networks. ...
arXiv:2003.03485v1
fatcat:yeqofzrn5redrluzufharl3xly
« Previous
Showing results 1 — 15 out of 6,963 results