A method for constructing a confidence bound for the actual error rate of a prediction rule in high dimensions

Kevin K. Dobbin
2008 Biostatistics  
Constructing a confidence interval for the actual, conditional error rate of a prediction rule from multivariate data is problematic because this error rate is not a population parameter in the traditional sense-it is a functional of the training set. When the training set changes, so does this "parameter." A valid method for constructing confidence intervals for the actual error rate had been previously developed by McLachlan. However, McLachlan's method cannot be applied in many cancer
more » ... h settings because it requires the number of samples to be much larger than the number of dimensions (n >> p), and it assumes that no dimension-reducing feature selection step is performed. Here, an alternative to McLachlan's method is presented that can be applied when p >> n, with an additional adjustment in the presence of feature selection. Coverage probabilities of the new method are shown to be nominal or conservative over a wide range of scenarios. The new method is relatively simple to implement and not computationally burdensome.
doi:10.1093/biostatistics/kxn035 pmid:19039030 pmcid:PMC2733174 fatcat:lzuy74avefdqtja6zec3hdzazq