Filters








273 Hits in 5.1 sec

Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks [article]

Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi
2020 arXiv   pre-print
Leveraging tools from Markov chain theory, we derive a meaningful lower rank bound in deep linear networks. Empirically, we also demonstrate that this rank robustness generalizes to ReLU nets.  ...  In this work we highlight the fact that batch normalization is an effective strategy to avoid rank collapse for both linear and ReLU networks.  ...  in batch normalized networks and can serve as a solid basis for a more complete theoretical understanding.  ... 
arXiv:2003.01652v3 fatcat:sifsf7zdffahnkkfl7w6bf5uyi

NIPS 2016 Tutorial: Generative Adversarial Networks [article]

Ian Goodfellow
2017 arXiv   pre-print
The tutorial describes: (1) Why generative modeling is a topic worth studying, (2) how generative models work, and how GANs compare to other generative models, (3) the details of how GANs work, (4) research  ...  frontiers in GANs, and (5) state-of-the-art image models that combine GANs with other methods.  ...  Many thanks also to those who commented on his Twitter and Facebook posts asking which topics would be of interest to the tutorial audience. Thanks also to D.  ... 
arXiv:1701.00160v4 fatcat:m4z3oxl5erainflnwiui5ken5u

Training Quantized Nets: A Deeper Understanding [article]

Hao Li, Soham De, Zheng Xu, Christoph Studer, Hanan Samet, Tom Goldstein
2017 arXiv   pre-print
In this work, we investigate training methods for quantized neural networks from a theoretical viewpoint. We first explore accuracy guarantees for training methods under convexity assumptions.  ...  quantized training methods lack, which explains the difficulty of training using low-precision arithmetic.  ...  Contributions This paper studies quantized training methods from a theoretical perspective, with the goal of understanding the differences in behavior, and reasons for success or failure, of various methods  ... 
arXiv:1706.02379v3 fatcat:rryfpq3gufbd3gv4qfb65kbz5y

Gradual Learning of Recurrent Neural Networks [article]

Ziv Aharoni, Gal Rattner, Haim Permuter
2018 arXiv   pre-print
Motivated by the Data Processing Inequality (DPI), we formulate the multi-layered network as a Markov chain, introducing a training method that comprises training the network gradually and using layer-wise  ...  Recurrent Neural Networks (RNNs) achieve state-of-the-art results in many sequence-to-sequence modeling tasks. However, RNNs are difficult to train and tend to suffer from overfitting.  ...  GL increases the network depth gradually as training progresses, and LWGC adjusts a gradient clipping norm in a layerwise manner at every learning phase of the training.  ... 
arXiv:1708.08863v2 fatcat:l4wuptmwm5cfpfvcuwovfb4xja

Understanding Autoencoders with Information Theoretic Concepts [article]

Shujian Yu, Jose C. Principe
2019 arXiv   pre-print
In this paper, we illustrate an advanced information theoretic methodology to understand the dynamics of learning and the design of autoencoders, a special type of deep learning architectures that resembles  ...  Despite their great success in practical applications, there is still a lack of theoretical and systematic methods to analyze deep neural networks.  ...  Robert Jenssen from the UiT -The Arctic University of Norway for their careful reading of our manuscript and many insightful comments and suggestions.  ... 
arXiv:1804.00057v3 fatcat:5kne3je7cbbx5bvc2d5vro7fde

Generative methods for sampling transition paths in molecular dynamics [article]

Tony Lelièvre, Geneviève Robin, Inass Sekkat, Gabriel Stoltz, Gabriel Victorino Cardoso
2022 arXiv   pre-print
Molecular systems often remain trapped for long times around some local minimum of the potential energy function, before switching to another one -- a behavior known as metastability.  ...  In view of the promises of machine learning techniques, we explore in this work two approaches to more efficiently generate transition paths: sampling methods based on generative models such as variational  ...  The work of T.L. and G.S. project is funded in part by the European Research Council (ERC) under the European Union's Horizon 2020  ... 
arXiv:2205.02818v1 fatcat:7raxlkwp6fcsdoc7j4frgtde7e

A Chain Graph Interpretation of Real-World Neural Networks [article]

Yuesong Shen, Daniel Cremers
2020 arXiv   pre-print
It is thus a promising framework that deepens our understanding of neural networks and provides a coherent theoretical formulation for future deep learning research.  ...  The last decade has witnessed a boom of deep learning research and applications achieving state-of-the-art results in various domains.  ...  Acknowledgments and Disclosure of Funding We would like to thank Tao Wu and Florian Bernard for helpful discussions and proofreadings.  ... 
arXiv:2006.16856v2 fatcat:vp3khlurmbfejgbjdylruw7mo4

What Are Bayesian Neural Network Posteriors Really Like? [article]

Pavel Izmailov, Sharad Vikram, Matthew D. Hoffman, Andrew Gordon Wilson
2021 arXiv   pre-print
We show that (1) BNNs can achieve significant performance gains over standard training and deep ensembles; (2) a single long HMC chain can provide a comparable representation of the posterior to multiple  ...  For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as mean-field variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC).  ...  We draw the parameters of the network from a Gaussian distribution with mean 0 and standard deviation 0.1.  ... 
arXiv:2104.14421v1 fatcat:vu3qcg6nnrdaxjixb6yc55pkfm

Generalization Bounds For Meta-Learning: An Information-Theoretic Analysis [article]

Qi Chen, Changjian Shui, Mario Marchand
2021 arXiv   pre-print
We derive a novel information-theoretic analysis of the generalization property of meta-learning algorithms.  ...  Concretely, our analysis proposes a generic understanding of both the conventional learning-to-learn framework and the modern model-agnostic meta-learning (MAML) algorithms.  ...  Acknowledgments and Disclosure of Funding  ... 
arXiv:2109.14595v2 fatcat:iurfiajb7be2rbkuywqxiqthbi

Information Dropout: Learning Optimal Representations Through Noisy Computation

Alessandro Achille, Stefano Soatto
2018 IEEE Transactions on Pattern Analysis and Machine Intelligence  
We show that our regularized loss function can be efficiently minimized using Information Dropout, a generalization of dropout rooted in information theoretic principles that automatically adapts to the  ...  We show that this can be solved by adding a regularization term, which is in turn related to injecting multiplicative noise in the activations of a Deep Neural Network, a special case of which is the common  ...  We are very grateful to the reviewers for their through analysis of the paper.  ... 
doi:10.1109/tpami.2017.2784440 pmid:29994167 fatcat:ejcrnroedvhtxjl4vb7vj3vwgu

The Role of Cross-Silo Federated Learning in Facilitating Data Sharing in the Agri-Food Sector [article]

Aiden Durrant, Milan Markovic, David Matthews, David May, Jessica Enright, Georgios Leontidis
2021 arXiv   pre-print
machine learning model that facilitates data sharing across supply chains.  ...  Protectiveness of data is natural in this setting; data is a precious commodity for data owners, which if used properly can provide them with useful insights on operations and processes leading to a competitive  ...  Acknowledgements This work was supported by an award made by the UKRI/EPSRC funded Internet of Food ThingsNetwork+ grant EP/R045127/1.  ... 
arXiv:2104.07468v1 fatcat:kpnn66urhra7vn3pgycutcxo5m

Layer-wise Learning of Stochastic Neural Networks with Information Bottleneck [article]

Thanh T. Nguyen, Jaesik Choi
2019 arXiv   pre-print
Though the original IB has been extensively studied, there has not been much understanding of multiple bottlenecks which better fit in the context of neural networks.  ...  We thus propose a simple compromised scheme of IMB which in turn generalizes maximum likelihood estimate (MLE) principle in the context of stochastic neural networks.  ...  (e.g., early stopping, weight decay, dropout [31] , and batch normalization [16] ), and optimization methods [17] .  ... 
arXiv:1712.01272v5 fatcat:p4sezgntjjaklgphttenf2jwme

Understanding the wiring evolution in differentiable neural architecture search [article]

Sirui Xie, Shoukang Hu, Xinjiang Wang, Chunxiao Liu, Jianping Shi, Xunying Liu, Dahua Lin
2021 arXiv   pre-print
To understand how wiring topology evolves, we study the underlying mechanism of several existing differentiable NAS frameworks.  ...  To anatomize these phenomena, we propose a unified view on searching algorithms of existing frameworks, transferring the global optimization to local cost minimization.  ...  in the batch normalization is set to be 1e-5.5 Here we strictly follow the training setup in(Liu et al., 2018), with BN affine (Ioffe and Szegedy, 2015) disabled.  ... 
arXiv:2009.01272v4 fatcat:ylvxns3gd5drvfjrqrhfngv6le

Understanding Learning Dynamics of Binary Neural Networks via Information Bottleneck [article]

Vishnu Raj, Nancy Nayak, Sheetal Kalyani
2020 arXiv   pre-print
In this paper, we present an information-theoretic perspective of BNN training.  ...  We analyze BNNs through the Information Bottleneck principle and observe that the training dynamics of BNNs is considerably different from that of Deep Neural Networks (DNNs).  ...  IB principle [14, 13] formulates intermediate hidden layer activations in a neural network as a successive Markov chain.  ... 
arXiv:2006.07522v1 fatcat:3tz44z7ia5hw5ftohiqg67z2b4

Emergence of Invariance and Disentanglement in Deep Representations [article]

Alessandro Achille, Stefano Soatto
2018 arXiv   pre-print
as a measure of complexity of a learned model, yielding a novel Information Bottleneck for the weights.  ...  We propose regularizing the loss by bounding such a term in two equivalent ways: One with a Kullbach-Leibler term, which relates to a PAC-Bayes perspective; the other using the information in the weights  ...  In particular we have the Markov chain y → x → z.  ... 
arXiv:1706.01350v3 fatcat:jr3iz4pvsreazenv5rmvprwpvy
« Previous Showing results 1 — 15 out of 273 results