A cross-validation scheme for machine learning algorithms in shotgun proteomics

Viktor Granholm, William Noble, Lukas Käll
2012 BMC Bioinformatics  
Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine
more » ... g algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting. Granholm et al. BMC Bioinformatics 2012, 13(Suppl 16):S3
doi:10.1186/1471-2105-13-s16-s3 pmid:23176259 pmcid:PMC3489528 fatcat:ohqvuwz4abe2vdem6kotmprooq