Systematic Evaluation of Protein Sequence Filtering Algorithms for Proteoform Identification Using Top-Down Mass Spectrometry
Significance of the study Identifying proteoforms with primary structural alterations is essential to understanding protein functions and related biological processes. In this study, we present new protein sequence filtering algorithms that outperform existing ones for top-down mass spectrometry-based proteoform identification. Combining the filtering algorithms and existing spectral alignment algorithms will significantly improve the sensitivity in proteoform identification and facilitate the
... tudies of proteoforms with alterations. Abstract Complex proteoforms contain various primary structural alterations resulting from variations in genes, RNA, and proteins. Top-down mass spectrometry is commonly used for analyzing complex proteoforms because it provides whole sequence information of the proteoforms. Proteoform identification by top-down mass spectral database search is a challenging computational problem because the types and/or locations of some alterations in target proteoforms are in general unknown. Although spectral alignment and mass graph alignment algorithms have been proposed for identifying proteoforms with unknown alterations, they are extremely slow to align millions of spectra against tens of thousand protein sequences in high throughput proteome level analyses. Many software tools in this area combine efficient protein sequence filtering algorithms and spectral alignment algorithms to speed up database search. As a result, the performance of these tools heavily relies on the sensitivity and efficiency of their filtering algorithms. Here we propose two This article is protected by copyright. All rights reserved. efficient approximate spectrum filtering algorithms for proteoform identification. We evaluated the performances of the proposed algorithms and 4 existing ones on simulated and real top-down mass spectrometry data sets. Experiments showed that the proposed algorithms outperformed the existing ones for complex proteoform identification. In addition, combining the proposed filtering algorithms and mass graph alignment algorithms identified many proteoforms missed by ProSightPC in proteome-level proteoform analyses.