MODERN COMPUTATIONAL STRATEGIES FOR PROTEIN INFERENCE IN SHOTGUN PROTEOMIC
СОВРЕМЕННЫЕ ВЫЧИСЛИТЕЛЬНЫЕ СТРАТЕГИИ ДЛЯ ВЫВОДА БЕЛКОВ В ПРОТЕОМИКЕ ДРОБОВИКА

Y. S. Golenko, A. A. Ismailova
2021 Izvestiâ Nacionalʹnoj akademii nauk Respubliki Kazahstan. Seriâ fiziko-matematičeskaâ  
Today, shotgun proteomics is a powerful approach to characterize proteomes in biological samples. Unlike the top-down proteomics strategy, shotgun proteomics is characterized by high separation efficiency and mass spectral sensitivity. At the same time, it places higher demands on the computational and statistical methods required for peptide identification, protein identification, and label-free quantification. The main purpose of shotgun proteomics is to identify the shape and amount of each
more » ... and amount of each protein by combining liquid chromatography with tandem mass spectrometry. The analysis and interpretation of experimental data is the final and most important stage in proteomics; they also generate a large number of problems that require complex computational solutions. One of the most important tasks, of course, is the identification of proteins present in the experimental sample. As a rule, this task is divided into two main components: the stage of assigning experimental tandem mass spectra to peptides obtained from the protein database, and the stage of comparing peptides with proteins and quantitative assessment of the reliability of the identified proteins. It is also worth considering that the assessment of the reliability of the data obtained can be a separate, no less important and complex task. In this article, we propose to consider protein identification only as a problem of statistical inference, and also describe a number of methods that can be used to solve it. We classify the existing approaches into (1) rule-based methods, (2) combinatorial optimization methods, and (3) probabilistic inference methods. Integer programming and Bayesian inference frameworks are used to represent methods. We also discuss the main problems of protein identification and suggest possible solutions to these problems.
doi:10.32014/2021.2518-1726.21 fatcat:is52cmbl5ff6bognbbufrlcf2y