AMS 4.0: consensus prediction of post-translational modifications in protein sequences

Dariusz Plewczynski, Subhadip Basu, Indrajit Saha
2012 Amino Acids  
We present here the 2011 update of the Auto-Motif Service (AMS 4.0) that predicts the wide selection of 88 different types of the single amino acid post-translational modifications (PTM) in protein sequences. The selection of experimentally confirmed modifications is acquired from the latest UniProt and Phospho.ELM databases for training. The sequence vicinity of each modified residue is represented using amino acids physico-chemical features encoded using high quality indices (HQI) obtaining
more » ... automatic clustering of known indices extracted from AAindex database. For each type of the numerical representation, the method builds the ensemble of Multi-Layer Perceptron (MLP) pattern classifiers, each optimising different objectives during the training (for example the recall, precision or area under the ROC curve (AUC)). The consensus is built using brainstorming technology, which combines multi-objective instances of machine learning algorithm, and the data fusion of different training objects representations, in order to boost the overall prediction accuracy of conserved short sequence motifs. The performance of AMS 4.0 is compared with the accuracy of previous versions, which were constructed using single machine learning methods (artificial neural networks, support vector machine). Our software improves the average AUC score of the earlier version by close to 7 % as calculated on the test datasets of all 88 PTM types. Moreover, for the selected most-difficult sequence motifs types it is able to improve the prediction performance by almost 32 %, when compared with previously used single machine learning methods. Summarising, the brainstorming consensus meta-learning methodology on the average boosts the AUC score up to around 89 %, averaged over all 88 PTM types. Detailed results for single machine learning methods and the consensus methodology are also provided, together with the comparison to previously published methods and state-of-the-art software tools. The source code and precompiled binaries of brainstorming tool are available at http://code.google.com/p/automotifserver/ under Apache 2.0 licensing. Keywords Post-translational modifications Á AMS-4 Á High quality indices Á MLP Á Consensus Background Post-translational modification (PTM) is a chemical modification of a protein after its translation. During protein synthesis, a protein is built using basic blocks of twenty different amino acids. Then the process of modification is taking place by attaching to them other biochemical functional groups such as acetate, phosphate, various lipids and carbohydrates, by changing the chemical nature of an D. Plewczynski and S. Basu contributed equally to this work. Electronic supplementary material The online version of this article (
doi:10.1007/s00726-012-1290-2 pmid:22555647 pmcid:PMC3397139 fatcat:aulz6lihjrbydmbxqrqztlvjrq