A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
The Generic Holdout: Preventing False-Discoveries in Adaptive Data Science
[article]
2018
arXiv
pre-print
Adaptive data analysis has posed a challenge to science due to its ability to generate false hypotheses on moderately large data sets. In general, with non-adaptive data analyses (where queries to the data are generated without being influenced by answers to previous queries) a data set containing n samples may support exponentially many queries in n. This number reduces to linearly many under naive adaptive data analysis, and even sophisticated remedies such as the Reusable Holdout (Dwork et.
arXiv:1809.05596v1
fatcat:46qsf3p6szbtled26hf2megepa