Correcting mistakes in predicting distributions

Valérie Marot-Lassauzaie, Michael Bernhofer, Burkhard Rost
2018 Bioinformatics  
Motivation: Many applications monitor predictions of a whole range of features for biological datasets, e.g. the fraction of secreted human proteins in the human proteome. Results and error estimates are typically derived from publications. Results: Here, we present a simple, alternative approximation that uses performance estimates of methods to error-correct the predicted distributions. This approximation uses the confusion matrix (TP true positives, TN true negatives, FP false positives and
more » ... N false negatives) describing the performance of the prediction tool for correction. As proof-of-principle, the correction was applied to a two-class (membrane/not) and to a seven-class (localization) prediction. Availability and implementation: Datasets and a simple JavaScript tool available freely for all users at http://www.rostlab.org/services/distributions.
doi:10.1093/bioinformatics/bty346 pmid:29762646 fatcat:r4m5twjpzraglbnx7k5x6g7bii