Bioinformatics analysis of proteins and proteomes

Islam Mohammad
The advancement of next-generation proteomics methodologies has led to an explosion in proteomics data. However, the analysis and interpretation of this data remains a challenge, as several proteins remain unannotated and uncharacterised for many organisms. Despite the presence of the large volume of mass spectrometry (MS) data in various datasets, over 10% human proteins are still considered 'missing'. Bioinformatics techniques can be used to provide comprehensive annotations for entire
more » ... es to provide valuable information regarding putative functions of proteins that can be validated and or supplemented with experimental data. The aims of this thesis are to tackle some of these challenges, firstly to develop a generic in silico bioinformatics pipeline to identify homologues and map putative functional signatures, gene ontology terms and biochemical pathways of novel organisms, or "missing" proteins. This pipeline was used to identify homologues for 2,587 proteins and functional annotation for 2,486 proteins from black Périgord truffle (Tuber melanosporum Vittad), followed by MS-based shotgun proteomics to validate 836 proteins. The same pipeline was then used to annotate the human "missing" protein sequences on each human chromosome available through the ProtAnnotator web portal, with homologues from the mammalian kingdom for 2538 (66.2%, based on September 2013 data). ProtAnnotator also functionally annotated 1945 (50.8%) "missing" human proteins. ProtAnnotator 2.0 automated the process and provides an update to the annotation of the truffle proteome. The lack of coherency between the proteomics data submitted to various databases, processed by different search engines has limited their integration in the quest for uncovering human "missing" proteins. To this end, a scheme was worked out for comparing proteomics data from different sources, looking at proteotypicity and search engine scores, with guidelines on spectral quality analysis as well. Finally, ProtAnnotator and the proteomics data integration [...]
doi:10.25949/19433357 fatcat:idrznpnkmfctfoxypc5cztsn3m