Filters








10 Hits in 7.0 sec

In Defense of Uniform Convergence: Generalization via derandomization with an application to interpolating predictors [article]

Jeffrey Negrea, Gintare Karolina Dziugaite, Daniel M. Roy
2021 arXiv   pre-print
We propose to study the generalization error of a learned predictor ĥ in terms of that of a surrogate (potentially randomized) predictor that is coupled to ĥ and designed to trade empirical risk for control  ...  We also show that replacing ĥ by its conditional distribution with respect to an arbitrary σ-field is a convenient way to derandomize.  ...  error of learned classifier via uniform convergence of a suitable derandomized classifier.  ... 
arXiv:1912.04265v3 fatcat:zse5m4rqanftfpbed2xq45o7aq

On Uniform Convergence and Low-Norm Interpolation Learning [article]

Lijia Zhou and Danica J. Sutherland and Nathan Srebro
2021 arXiv   pre-print
But we argue we can explain the consistency of the minimal-norm interpolator with a slightly weaker, yet standard, notion: uniform convergence of zero-error predictors in a norm ball.  ...  We consider an underdetermined noisy linear regression model where the minimum-norm interpolating predictor is known to be consistent, and ask: can uniform convergence in a norm ball, or at least (following  ...  “In Defense of Uniform Con- vergence: Generalization via derandomization with an application to interpolating predictors.”  ... 
arXiv:2006.05942v3 fatcat:q6sday3xfzfb5fcertowseduii

Exact Gap between Generalization Error and Uniform Convergence in Random Feature Models [article]

Zitong Yang, Yu Bai, Song Mei
2021 arXiv   pre-print
We show that, in the setting where the classical uniform convergence bound is vacuous (diverges to ∞), uniform convergence over the interpolators still gives a non-trivial bound of the test error of interpolating  ...  Recent work showed that there could be a large gap between the classical uniform convergence bound and the actual test error of zero-training-error predictors (interpolators) such as deep neural networks  ...  In defense of uni- form convergence: Generalization via derandomization with an application to interpolating predictors. In Interna- tional Conference on Machine Learning, pp. 7263-7272.  ... 
arXiv:2103.04554v1 fatcat:n2jfpgykpffjbhxkm37sz4dvsm

Explaining generalization in deep learning: progress and fundamental limits [article]

Vaishnavh Nagarajan
2021 arXiv   pre-print
With this realization in mind, in the last part of the thesis, we will change course and introduce an empirical technique to estimate generalization using unlabeled data.  ...  Given its popularity, in this thesis, we will also take a step back to identify the fundamental limits of uniform convergence as a tool to explain generalization.  ...  bounds via uniform convergence.  ... 
arXiv:2110.08922v1 fatcat:blzj6rhlffgrjpeaj2ia57g6gq

The Sample Complexity of One-Hidden-Layer Neural Networks [article]

Gal Vardi, Ohad Shamir, Nathan Srebro
2022 arXiv   pre-print
We begin by proving that in general, controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees (independent of the network width), while a stronger  ...  smooth (with the result extending to deeper networks); and second, for certain types of convolutional networks.  ...  Acknowledgements This research is supported in part by European Research Council (ERC) grant 754705, and NSF-BSF award 1718970.  ... 
arXiv:2202.06233v1 fatcat:onrhz4ktvnexpjrrjetfv6hx5a

Data splitting improves statistical performance in overparametrized regimes [article]

Nicole Mücke, Enrico Reiss, Jonas Rungenhagen, Markus Klein
2021 arXiv   pre-print
We further provide a unified framework that allows to analyze both the finite and infinite dimensional setting. We numerically demonstrate the effect of different model parameters.  ...  While large training datasets generally offer improvement in model performance, the training process becomes computationally expensive and time consuming.  ...  In defense of uniform convergence: Generalization via derandomization with an application to interpolat- ing predictors. In International Conference on Machine Learning, pages 7263-7272.  ... 
arXiv:2110.10956v1 fatcat:kggbrkt7onc3pagrmlwfcfp5oy

Dissecting Non-Vacuous Generalization Bounds based on the Mean-Field Approximation [article]

Konstantinos Pitas
2020 arXiv   pre-print
We investigate common explanations, such as the failure of VI due to problems in optimization or choosing a suboptimal prior.  ...  Explaining how overparametrized neural networks simultaneously achieve low risk and zero empirical risk on benchmark datasets is an open problem.  ...  In defense of uniform convergence: Generalization via derandomiza- tion with an application to interpolating predictors. arXiv preprint arXiv:1912.04265, 2019.  ... 
arXiv:1909.03009v2 fatcat:beopwgmrabbtnnh4rcvkleye4m

Computer Science in High Performance Sport-Applications and Implications for Professional Coaching (Dagstuhl Seminar 13272)

Benjamin Doerr, Nikolaus Hansen, Jonathan Shapiro, L Darrell, Whitley, Koen Lemmink, Stuart Morgan, Jaime Sampaio3, Dietmar Saupe, Mai Gehrke, Jean-Eric Pin, Victor Selivanov (+5 others)
2013 unpublished
We acknowledge support from the Strategic Basic Research (SBO) Programme of the Flemish Agency for Innovation through Science and Technology (IWT) in the context of the SPION project 6 under grant agreement  ...  These parameters are sent to a smart phone application via ANT+ TM . The measured data is then transmitted to an application server using wireless communication technologies (UMTS, HSUPA).  ...  In addition,there appears to be a regime for the number of data points (n) and the number of cluster centers (k) where generated problems where k-means takes much longer to converge.  ... 
fatcat:yg5vvzymuvcibirco4nawxes4m

Dagstuhl Reports, Volume 6, Issue 11, November 2016, Complete Issue [article]

2017
, their use in potential functions methods, in SVM, the general "kernel trick", the observation that kernels can be defined on arbitrary sets of objects, the link to GPs, and finally the idea to represent  ...  Kernel mean representations lend themselves well to the development of kernel methods for probabilistic programming, i.e., methods for lifting functional operations defined for data types to the same functional  ...  time O(n 2 ) and space O( √ n), derandomizing an algorithm of Wang.  ... 
doi:10.4230/dagrep.6.11 fatcat:tfkdfittpjdydfv7ejvk4bvnh4

2016 Jahresbericht Annual Report

Jähnichen Stefan, Raimund Geschäftsführung, Seidel, Meißner Heike, Gesellschafter
unpublished
We organizers are all thankful to the participants, who all brought a unique insight to the seminar, which in my humble opinion, succeeded in its aims.  ...  The organizers would like to express their gratitude to all participants of the Seminar.  ...  Therefore, these boosting techniques lack efficiency in case dependent tasks of an application mapped to two different cores or, in general, for multiple concurrently executing applications with distinctive  ... 
fatcat:oef4sw3ebbcwhe7qbtizni4lfy