1,413 Hits in 6.9 sec

Inductive Bias of Multi-Channel Linear Convolutional Networks with Bounded Weight Norm [article]

Meena Jagadeesan, Ilya Razenshteyn, Suriya Gunasekar
2022 arXiv   pre-print
We provide a function space characterization of the inductive bias resulting from minimizing the ℓ_2 norm of the weights in multi-channel convolutional neural networks with linear activations and empirically  ...  (b) In contrast, for multi-channel inputs, multiple output channels can be necessary to merely realize all matrix-valued linear functions and thus the inductive bias does depend on C.  ...  Multi-channel linear convolutional network We consider two layer linear convolutional networks with multiple channels in the convolution layer.  ... 
arXiv:2102.12238v4 fatcat:jw3apcmw6nekjnrsbnrrrc7tuu

Identity Crisis: Memorization and Generalization under Extreme Overparameterization [article]

Chiyuan Zhang and Samy Bengio and Moritz Hardt and Michael C. Mozer and Yoram Singer
2020 arXiv   pre-print
Our work helps to quantify and visualize the sensitivity of inductive biases to architectural choices such as depth, kernel width, and number of channels.  ...  We examine fully-connected and convolutional networks (FCN and CNN), both linear and nonlinear, initialized randomly and then trained to minimize the reconstruction error.  ...  B.3 C In particular, we consider 2D convolutional networks for data with the structure of multi-channel images.  ... 
arXiv:1902.04698v4 fatcat:y36l5agdcbbd3dx52dfvrpfmlm

Faster Neural Network Training with Approximate Tensor Operations [article]

Menachem Adelman, Kfir Y. Levy, Ido Hakimi, Mark Silberstein
2021 arXiv   pre-print
We apply approximate tensor operations to single and multi-node training of MLP and CNN networks on MNIST, CIFAR-10 and ImageNet datasets.  ...  We propose a novel technique for faster deep neural network training which systematically applies sample-based approximation to the constituent tensor operations, i.e., matrix multiplications and convolutions  ...  Acknowledgments and Disclosure of Funding  ... 
arXiv:1805.08079v3 fatcat:kwhjhtkwrvfzddpd3vvnjr4acq

Local Perception-Aware Transformer for Aerial Tracking [article]

Changhong Fu, Weiyu Peng, Sihang Li, Junjie Ye, Ziang Cao
2022 arXiv   pre-print
However, the Transformer structure is lack of enough inductive bias.  ...  bias.  ...  Inductive Bias of Transformer Achieving promising results in vision tasks, inductive biases of the typical Transformer are exploited further.  ... 
arXiv:2208.00662v1 fatcat:z4caixi6ojhlxmilaicqpwluna

Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators [article]

Reinhard Heckel, Mahdi Soltanolkotabi
2020 arXiv   pre-print
In this paper, we attribute this effect to a particular architectural choice of convolutional networks, namely convolutions with fixed interpolating filters.  ...  Convolutional Neural Networks (CNNs) have emerged as highly successful tools for image generation, recovery, and restoration.  ...  The number of channels, k, determines the number of weight parameters of the network, given by dk 2 + k out k.  ... 
arXiv:1910.14634v2 fatcat:qnfvd3kqw5hpterc733pjoasmq

Inductive Bias of Deep Convolutional Networks through Pooling Geometry [article]

Nadav Cohen, Amnon Shashua
2017 arXiv   pre-print
Our formal understanding of the inductive bias that drives the success of convolutional networks on computer vision tasks is limited.  ...  In this paper we study the ability of convolutional networks to model correlations among regions of their input.  ...  These models follow the standard paradigm of locality, weight sharing and pooling, yet differ from the most conventional convolutional networks in that their point-wise activations are linear, with non-linearity  ... 
arXiv:1605.06743v4 fatcat:oqrurf6s7bhslaatqbb3lvh2im

Regularisation of neural networks by enforcing Lipschitz continuity

Henry Gouk, Eibe Frank, Bernhard Pfahringer, Michael J. Cree
2020 Machine Learning  
To this end, we provide a simple technique for computing an upper bound to the Lipschitz constant—for multiple p-normsof a feed forward neural network composed of commonly used layer types.  ...  AbstractWe investigate the effect of explicitly enforcing the Lipschitz continuity of neural networks with respect to their inputs.  ...  such inductive biases: more informative inductive biases should yield better sample efficiency.  ... 
doi:10.1007/s10994-020-05929-w fatcat:bdeuxxsnjbfq3n6tdhg2ooyo6m

An Optimization and Generalization Analysis for Max-Pooling Networks [article]

Alon Brutzkus, Amir Globerson
2021 arXiv   pre-print
In particular, they are part of most convolutional architectures used in machine vision, since pooling is a natural approach to pattern detection problems.  ...  Here we perform a theoretical analysis of a convolutional max-pooling architecture, proving that it can be globally optimized, and can generalize well even for highly over-parameterized models.  ...  Other works study the inductive bias of gradient descent on fully connected linear or non-linear networks (Ji & Telgarsky, 2019a; Arora et al., 2019a; Wei et al., 2019; Brutzkus et al., 2018; Dziugaite  ... 
arXiv:2002.09781v4 fatcat:oi6ltd3dkre3ljc4qwqxvmitja

Regularisation of Neural Networks by Enforcing Lipschitz Continuity [article]

Henry Gouk, Eibe Frank, Bernhard Pfahringer, Michael J. Cree
2020 arXiv   pre-print
To this end, we provide a simple technique for computing an upper bound to the Lipschitz constant—for multiple p-normsof a feed forward neural network composed of commonly used layer types.  ...  We investigate the effect of explicitly enforcing the Lipschitz continuity of neural networks with respect to their inputs.  ...  can produce well-performing models with fewer training examples than an algorithm without such inductive biases: more informative inductive biases should yield better sample efficiency.  ... 
arXiv:1804.04368v3 fatcat:xhcxvd7utff7blaswrbws3drqi

Local Disentanglement in Variational Auto-Encoders Using Jacobian L_1 Regularization [article]

Travers Rhodes, Daniel D. Lee
2021 arXiv   pre-print
of the latent representation with individual factors of variation.  ...  independent factors of variation in images of multiple objects or images with multiple parts.  ...  This use of the L 1 norm to choose an orientation is inspired by similar use in linear models.  ... 
arXiv:2106.02923v2 fatcat:ardztcvy45ez7egasoro4fz3dq

On the Spectral Bias of Convolutional Neural Tangent and Gaussian Process Kernels [article]

Amnon Geifman, Meirav Galun, David Jacobs, Ronen Basri
2022 arXiv   pre-print
We prove that, with normalized multi-channel input and ReLU activation, the eigenfunctions of these kernels with the uniform measure are formed by products of spherical harmonics, defined over the channels  ...  Our results provide concrete quantitative characterization of over-parameterized convolutional network architectures.  ...  Acknowledgement This research is partially supported by the Israeli Council for Higher Education (CHE) via the Weizmann Data Science Research Center and by research grants from the Estate of Tully and  ... 
arXiv:2203.09255v1 fatcat:qdr5a5hetbhyba27mpqjbsdnwe

On Measuring Excess Capacity in Neural Networks [article]

Florian Graf, Sebastian Zeng, Bastian Rieck, Marc Niethammer, Roland Kwitt
2022 arXiv   pre-print
The capacity-driving terms in our bounds are the Lipschitz constants of the layers and a (2,1) group norm distance to the initializations of the convolution weights.  ...  Overall, this suggests a notion of compressibility with respect to weight norms, orthogonal to classic compression via weight pruning.  ...  Next, we study the general case of 2D multi-channel convolutions with strides.  ... 
arXiv:2202.08070v2 fatcat:pldznz7rjrgmdluvln5wyttvna

HyperInvariances: Amortizing Invariance Learning [article]

Ruchika Chavhan, Henry Gouk, Jan Stühmer, Timothy Hospedales
2022 arXiv   pre-print
Providing invariances in a given learning task conveys a key inductive bias that can lead to sample-efficient learning and good generalisation, if correctly specified.  ...  In an up-front learning phase, we learn a low-dimensional manifold of feature extractors spanning invariance to different transformations using a hyper-network.  ...  In particular, the second term of the bound would be proportional to a product of norms of the weight matrices in each layer, thus scaling exponentially with the depth of the network.  ... 
arXiv:2207.08304v1 fatcat:pryxbpkzybbv3dri7ed3l5uk2a

Deep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design [article]

Yoav Levine, David Yakira, Nadav Cohen, Amnon Shashua
2017 arXiv   pre-print
This description enables us to carry a graph-theoretic analysis of a convolutional network, with which we demonstrate a direct control over the inductive bias of the deep network via its channel numbers  ...  We use this connection for asserting novel theoretical observations regarding the role that the number of channels in each layer of the convolutional network fulfills in the overall inductive bias.  ...  Acknowledgements We have benefited from discussions with Or Sharir, Ronen Tamari, Markus Hauru and Eyal Leviatan.  ... 
arXiv:1704.01552v2 fatcat:4y5dlwfdxzfffceefbkpqsvrmy

Training invariances and the low-rank phenomenon: beyond linear networks [article]

Thien Le, Stefanie Jegelka
2022 arXiv   pre-print
The implicit bias induced by the training of neural networks has become a topic of rigorous study.  ...  In the limit of gradient flow and gradient descent with appropriate step size, it has been shown that when one trains a deep linear network with logistic or exponential loss on linearly separable data,  ...  We would also like to thank Matus Telgarsky for fruitful discussions on their related papers and on the Clarke subdifferential, and Kaifeng Lyu for pointing out an error in an earlier version of this paper  ... 
arXiv:2201.11968v2 fatcat:gwbq5gclfzfdzbfhgupenqc2va
« Previous Showing results 1 — 15 out of 1,413 results