Assessing Generalization of SGD via Disagreement [article]

Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, J. Zico Kolter
2022 arXiv   pre-print
We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate between the two networks on unlabeled test data. This builds on – and is a stronger version of – the observation in Nakkiran Bansal '20, which requires the second run to be on an altogether fresh training set. We further theoretically show that this peculiar
more » ... enomenon arises from the well-calibrated nature of ensembles of SGD-trained models. This finding not only provides a simple empirical measure to directly predict the test error using unlabeled test data, but also establishes a new conceptual connection between generalization and calibration.
arXiv:2106.13799v2 fatcat:f4ytduv6nvh2lcoltxuvgwzbbq