Filters








1,708 Hits in 4.6 sec

An Evaluation of Fisher Approximations Beyond Kronecker Factorization

César Laurent, Thomas George, Xavier Bouthillier, Nicolas Ballas, Pascal Vincent
2018 International Conference on Learning Representations  
We study two coarser approximations on top of a Kronecker factorization (K-FAC) of the Fisher Information Matrix, to scale up Natural Gradient to deep and wide Convolutional Neural Networks (CNNs).  ...  Both variants yield a further block-diagonal approximation tailored for CNNs, which is much more efficient to compute and invert.  ...  The authors would like to acknowledge the support of Calcul Quebec, Compute Canada, CIFAR and Facebook for research funding and computational resources.  ... 
dblp:conf/iclr/LaurentGBBV18 fatcat:u3hfg4kp2ranlmmvli3hd5anyu

Distributed Second-Order Optimization using Kronecker-Factored Approximations

Jimmy Ba, Roger B. Grosse, James Martens
2017 International Conference on Learning Representations  
Finally, we show that our distributed K-FAC method speeds up training of various state-of-the-art ImageNet classification models by a factor of two compared to an improved form of Batch Normalization .  ...  Unfortunately, they often employ severe approximations to the curvature matrix in order to scale to large models with millions of parameters, limiting their effectiveness in practice versus well-tuned  ...  Specifically, we approximate the second-order statistics matrix of the inputs as itself factoring as a Kronecker product. This gives an approximation which is a Kronecker product of three matrices.  ... 
dblp:conf/iclr/BaGM17 fatcat:kdqwtffwgravvgnrr6r2wirc7e

Kronecker-factored Curvature Approximations for Recurrent Neural Networks

James Martens, Jimmy Ba, Matt Johnson
2018 International Conference on Learning Representations  
It is based on an approximation to the Fisher information matrix (FIM) that makes assumptions about the particular structure of the network and the way it is parameterized.  ...  Kronecker-factor Approximate Curvature (Martens & Grosse, 2015) (K-FAC) is a 2nd-order optimization method which has been shown to give state-of-the-art performance on large-scale neural network optimization  ...  V 1 ARE KRONECKER-FACTORED It remains to show that the approximations developed in Section 3.5 can be combined with the Kronecker-factored approximations for V 0 and V 1 from Section 3.2.3 to yield an  ... 
dblp:conf/iclr/MartensBJ18 fatcat:ezcpmvsvzvcmpigyf65xoggu3u

A Kronecker-factored approximate Fisher matrix for convolution layers [article]

Roger Grosse, James Martens
2016 arXiv   pre-print
Similarly to the recently proposed Kronecker-Factored Approximate Curvature (K-FAC), each block of the approximate Fisher matrix decomposes as the Kronecker product of small matrices, allowing for efficient  ...  We present Kronecker Factors for Convolution (KFC), a tractable approximation to the Fisher matrix for convolutional networks based on a structured probabilistic model for the distribution over backpropagated  ...  We introduce Kronecker Factors for Convolution (KFC) , an approximation to the Fisher matrix for convolutional networks.  ... 
arXiv:1602.01407v2 fatcat:m4gaqeqdyngfrjuaxvps6c4lve

Online Structured Laplace Approximations For Overcoming Catastrophic Forgetting [article]

Hippolyt Ritter, Aleksandar Botev, David Barber
2018 arXiv   pre-print
We introduce the Kronecker factored online Laplace approximation for overcoming catastrophic forgetting in neural networks.  ...  In order to make our method scalable, we leverage recent block-diagonal Kronecker factored approximations to the curvature.  ...  An extension of the Kronecker factored curvature approximations to convolutional neural networks is presented in [10] .  ... 
arXiv:1805.07810v1 fatcat:azw6j7ky5ndzhfthecx56kqr6m

L2M: Practical posterior Laplace approximation with optimization-driven second moment estimation [article]

Christian S. Perone, Roberto Pereira Silveira, Thomas Paula
2021 arXiv   pre-print
However, instead of computing the curvature matrix, we show that, under some regularity conditions, the Laplace approximation can be easily constructed using the gradient second moment.  ...  In this work, we revisit Laplace approximation, a classical approach for posterior approximation that is computationally attractive.  ...  Although the Kronecker factorization can yield a better approximation when compared to a diagonal approximation of the curvature matrix, the Kronecker factors still have to be computed on another step  ... 
arXiv:2107.04695v1 fatcat:sveyll64kbhpragzn7hs233jsq

Noisy Natural Gradient as Variational Inference [article]

Guodong Zhang and Shengyang Sun and David Duvenaud and Roger Grosse
2018 arXiv   pre-print
This insight allows us to train full-covariance, fully factorized, or matrix-variate Gaussian variational posteriors using noisy versions of natural gradient, Adam, and K-FAC, respectively, making it possible  ...  ~fully factorized) or expensive and complicated inference procedures.  ...  Acknowledgements GZ was supported by an NSERC Discovery Grant, and SS was supported by a Connaught New Researcher Award and a Connaught Fellowship.  ... 
arXiv:1712.02390v2 fatcat:i7k7zuersbeazfdwsa2wyrxgxu

Estimating Model Uncertainty of Neural Networks in Sparse Information Form [article]

Jongseok Lee, Matthias Humt, Jianxiang Feng, Rudolph Triebel
2020 arXiv   pre-print
We present a sparse representation of model uncertainty for Deep Neural Networks (DNNs) where the parameter posterior is approximated with an inverse formulation of the Multivariate Normal Distribution  ...  Our exhaustive theoretical analysis and empirical evaluations on various benchmarks show the competitiveness of our approach over the current methods.  ...  Jianxiang Feng is supported by the Munich School for Data Science (MUDS) and Rudolph Triebel is a member of MUDS.  ... 
arXiv:2006.11631v1 fatcat:2fwwrpi7ere2djavxzcz627xmy

Natural continual learning: success is a journey, not (just) a destination [article]

Ta-Chu Kao, Kristopher T. Jensen, Gido M. van de Ven, Alberto Bernacchia, Guillaume Hennequin
2021 arXiv   pre-print
However, these methods often exhibit subpar performance in both feedforward and recurrent neural networks, with recurrent networks being of interest to the study of neural dynamics supporting biological  ...  Biological agents are known to learn many different tasks over the course of their lives, and to be able to revisit previous tasks and behaviors with little to no loss in performance.  ...  G Kronecker-factored approximation to the sums of Kronecker Products In this section, we consider three different Kronecker-factored approximations to the sum of two Kronecker products: X ⊗ Y ≈ Z = A ⊗  ... 
arXiv:2106.08085v2 fatcat:z25zqq37drgfbmm3jmdomymkjm

Optimizing Neural Networks with Kronecker-factored Approximate Curvature [article]

James Martens, Roger Grosse
2020 arXiv   pre-print
We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-Factored Approximate Curvature (K-FAC).  ...  It is derived by approximating various large blocks of the Fisher (corresponding to entire layers) as being the Kronecker product of two much smaller matrices.  ...  We would like to thank Ilya Sutskever for his constructive comments on an early draft of this paper.  ... 
arXiv:1503.05671v7 fatcat:ilwuai2ssrauzkxsxljdjdvf3e

Practical Gauss-Newton Optimisation for Deep Learning [article]

Aleksandar Botev, Hippolyt Ritter, David Barber
2017 arXiv   pre-print
We present an efficient block-diagonal ap- proximation to the Gauss-Newton matrix for feedforward neural networks.  ...  Our result- ing algorithm is competitive against state- of-the-art first order optimisation methods, with sometimes significant improvement in optimisation performance.  ...  Finally, we are grateful to James Martens for helpful discussions on the implementation of KFAC.  ... 
arXiv:1706.03662v2 fatcat:y3v6dpbswzgohp2frfcdof63re

Continual Learning With Extended Kronecker-Factored Approximate Curvature

Janghyeon Lee, Hyeong Gwon Hong, Donggyu Joo, Junmo Kim
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
The Hessian of a loss function represents the curvature of the quadratic penalty function, and a Kronecker-factored approximate curvature (K-FAC) is used widely to practically compute the Hessian of a  ...  We extend the K-FAC method so that the inter-example relations are taken into account and the Hessian of deep neural networks can be properly approximated under practical assumptions.  ...  factorization, an approximate factorization of the Fisher that is computationally manageable, accurate, and amendable to cheap partial updates.  ... 
doi:10.1109/cvpr42600.2020.00902 dblp:conf/cvpr/LeeHJK20 fatcat:a34n3vcy6fca5h6iisplv2z3gm

WoodFisher: Efficient Second-Order Approximation for Neural Network Compression [article]

Sidak Pal Singh, Dan Alistarh
2020 arXiv   pre-print
Recently, there has been significant interest in utilizing this information in the context of deep neural networks; however, relatively little is known about the quality of existing approximations in this  ...  We demonstrate that WoodFisher significantly outperforms popular state-of-the-art methods for one-shot pruning.  ...  Also, we would like to thank Alexander Shevchenko, Alexandra Peste, and other members of the group for fruitful discussions.  ... 
arXiv:2004.14340v5 fatcat:n2k7d354a5emxe4y5zewpotnh4

Continual Learning with Extended Kronecker-factored Approximate Curvature [article]

Janghyeon Lee, Hyeong Gwon Hong, Donggyu Joo, Junmo Kim
2020 arXiv   pre-print
The Hessian of a loss function represents the curvature of the quadratic penalty function, and a Kronecker-factored approximate curvature (K-FAC) is used widely to practically compute the Hessian of a  ...  We extend the K-FAC method so that the inter-example relations are taken into account and the Hessian of deep neural networks can be properly approximated under practical assumptions.  ...  [8] introduces Eigenvaluecorrected Kronecker factorization, an approximate factorization of the Fisher that is computationally manageable, accurate, and amendable to cheap partial updates.  ... 
arXiv:2004.07507v1 fatcat:iydgzqth4rbhndbdlc2jat3vmu

BackPACK: Packing more into backprop [article]

Felix Dangel, Frederik Kunstner, Philipp Hennig
2020 arXiv   pre-print
Its capabilities are illustrated by benchmark reports for computing additional quantities on deep neural networks, and an example application by testing several recent curvature approximations for optimization  ...  Yet, other quantities such as the variance of the mini-batch gradients or many approximations to the Hessian can, in theory, be computed efficiently, and at the same time as the gradient.  ...  The authors gratefully acknowledge financial support by the European Research Council through ERC StG Action 757275 / PANAMA; the DFG Cluster of Excellence "Machine Learning -New  ... 
arXiv:1912.10985v2 fatcat:ot4evhvsybhd7lv3rioel7b6oy
« Previous Showing results 1 — 15 out of 1,708 results