Correlated Parameters to Accurately Measure Uncertainty in Deep Neural Networks
IEEE Transactions on Neural Networks and Learning Systems
In this article, a novel approach for training deep neural networks using Bayesian techniques is presented. The Bayesian methodology allows for an easy evaluation of model uncertainty and, additionally, is robust to overfitting. These are commonly the two main problems classical, i.e., non-Bayesian architectures have to struggle with. The proposed approach applies variational inference in order to approximate the intractable posterior distribution. In particular, the variational distribution is
... defined as the product of multiple multivariate normal distributions with tridiagonal covariance matrices. Every single normal distribution belongs either to the weights or to the biases corresponding to one network layer. The layerwise a posteriori variances are defined based on the corresponding expectation values, and furthermore, the correlations are assumed to be identical. Therefore, only a few additional parameters need to be optimized compared with non-Bayesian settings. The performance of the new approach is evaluated and compared with other recently developed Bayesian methods. Basis of the performance evaluations are the popular benchmark data sets MNIST and CIFAR-10. Among the considered approaches, the proposed one shows the best predictive accuracy. Moreover, extensive evaluations of the provided prediction uncertainty information indicate that the new approach often yields more useful uncertainty estimates than the comparison methods.