PGMiner: Complete proteogenomics workflow; from data acquisition to result visualization

Canan Has, Jens Allmer
2017 Information Sciences  
In parallel with the development of nucleotide sequencing an equally important interest in further describing the sequence in terms of function arose and the latter represents the current bottleneck in the overall research question. Sequencing the transcriptome allows determination of expressed nucleotide sequences and using mass spectrometry allows sequencing on the protein level. Both approaches can only sequence a subset of the existing transcripts. Moreover, for example post translational
more » ... dification events can only be determined on the proteomics level. Therefore, it is essential to combine proteomics and genomics. For that purpose, proteogenomics data analysis pipelines have been described. Here, we describe a novel proteogenomics workflow which encompasses everything from the acquisition of data to result visualization in the Konstanz Information Miner (KNIME), a state of the art workflow management and data analytics platform. We amended KN-IME with a number of processes like peptide consensus prediction, peptide mapping, and database equalizing, as well as result visualization. This enabled construction of our new workflow, entitled PGMiner, which not only includes all data analysis steps, but is highly customizable which is rather cumbersome for most existing pipelines. Furthermore, no burdensome installation processes have to be performed making PGMiner the most user friendly tool available.
doi:10.1016/j.ins.2016.08.005 fatcat:oulcoeryojavrfy3uhw4tkosui