Community Evaluation of Glycoproteomics Informatics Solutions Reveals High-Performance Search Strategies of Glycopeptide Data [article]

Rebeca Kawahara, Kathirvel Alagesan, Marshall Bern, Weiqian Cao, Robert J Chalkley, Kai Cheng, Matthew S Choo, Nathan Edwards, Radoslav Goldman, Marcus Hoffmann, Yingwei Hu, Yifan Huang (+41 others)
2021 bioRxiv   pre-print
Glycoproteome profiling (glycoproteomics) remains a considerable analytical challenge that hinders rapid progress in glycobiology. The complex tandem mass spectra generated from glycopeptide mixtures require sophisticated analysis pipelines for structural determination. Diverse informatics solutions aiding the process have appeared, but their relative strengths and weaknesses remain untested. Conducted through the Human Proteome Project - Human Glycoproteomics Initiative, this community study
more » ... mprising both developers and expert users of glycoproteomics software is the first to evaluate the relative performance of current informatics solutions for comprehensive glycopeptide analysis. High-quality LC-MS/MS-based glycoproteomics datasets of N- and O-glycopeptides from serum proteins were shared with all teams. The relative team performance for efficient glycopeptide data analysis was systematically established through multiple orthogonal performance tests. Excitingly, several high-performance glycoproteomics informatics solutions and tools displaying a considerable performance potential were identified. While the study illustrated that significant informatics challenges remain in the analysis of glycopeptide data as indicated by a high discrepancy between the reported glycopeptides, a substantial list of commonly reported high-confidence glycopeptides could be extracted from the team reports. Further, the team performance profiles were correlated to the many study variables, which revealed important performance-associated search settings and search output variables, some intuitive others unexpected. This study concludes that diverse informatics solutions for comprehensive glycopeptide data analysis exist within the community, points to several high-performance search strategies, and specifies key variables that may guide future software developments and assist the experimental decision-making of practitioners in glycoproteomics.
doi:10.1101/2021.03.14.435332 fatcat:wcssvwpnvfaltif2dbqtzibcge