Modeling the functional consequences of single residue replacements in bacteriophage f1 gene V protein

M. Masso, E. Mathe, N. Parvez, K. Hijazi, I. I. Vaisman
2009 Protein Engineering Design & Selection  
A computational mutagenesis methodology utilizing a four-body, knowledge-based, statistical contact potential is applied toward globally quantifying relative environmental perturbations (residual scores) in bacteriophage f1 gene V protein (GVP) due to single amino acid substitutions. We show that residual scores correlate well with experimentally measured relative changes in protein function upon mutation. Residual scores also distinguish between GVP amino acid positions grouped according to
more » ... tein structural or functional roles or based on similarities in physicochemical characteristics. For each mutant, the in silico mutagenesis additionally yields local measures of environmental change (EC scores) occurring at every residue position (residual profile) relative to the native protein. Implementation of the random forest (RF) algorithm, utilizing experimental GVP mutants whose feature vector components include EC scores at the mutated position and at six structurally nearest neighbors, correctly classifies mutants based on function with up to 77% cross-validation accuracy while achieving 0.82 area under the receiver operating characteristic curve. A control experiment highlights the effectiveness of mutant feature vector signals, and a variety of learning curves are generated to analyze the impact of GVP mutant data set size on performance measures. An optimally trained RF model is subsequently used for inferring function for all the remaining unexplored GVP mutants. Edited by Richard Goldstein Fig. 7 . Learning curves. Error bars represent +1 SD from the mean. Fig. 8 . GVP mutational array. Columns, native amino acids; rows, substitutions; darker shades, experimental mutants; lighter shades, predicted mutants; white squares, self-substitutions; boxed numbers, DNA/RNA binding residues; shaded numbers, interface residues; boxed and shaded numbers, hydrophobic core residues. See Supplementary data available at PEDS online for color figure. Modeling functional consequences of GVP mutations
doi:10.1093/protein/gzp050 pmid:19690089 fatcat:wzxj2hej3zcg5dl3g26ufmutgy