Lost and found: re-searching and re-scoring proteomics data aids the discovery of bacterial proteins and improves proteome coverage [article]

Patrick Willems, Igor Fijalkowski, Petra Van Damme
2019 bioRxiv   pre-print
Prokaryotic genome annotation is heavily dependent on automated gene annotation pipelines that are prone to propagate errors and underestimate genome complexity. We describe an optimized proteogenomic workflow that uses ribo-seq and proteomic data of Salmonella Typhiumurium to identify unannotated proteins or alternative protein forms raised upon alternative translation initiation (i.e. N-terminal proteoforms). This data analysis encompasses the searching of co-fragmenting peptides and
more » ... ptides and post-processing with extended peptide-to-spectrum quality features including comparison to predicted fragment ion intensities. When applying this strategy, an enhanced proteome-depth is achieved as well as greater confidence for unannotated peptide hits. We demonstrate the general applicability of our pipeline by re-analyzing public Deinococcus radiodurans datasets. Taken together, systematic re-analysis using available prokaryotic (proteome) datasets holds great promise to assist in experimentally-based genome annotation.
doi:10.1101/2019.12.18.881375 fatcat:lmwzbpozvjb6vnpbcrn3g3jfai