Prediction of protease substrates using sequence and structure features

David T. Barkan, Daniel R. Hostetter, Sami Mahrus, Ursula Pieper, James A. Wells, Charles S. Craik, Andrej Sali
2010 Computer applications in the biosciences : CABIOS  
Motivation: Granzyme B (GrB) and caspases cleave specific protein substrates to induce apoptosis in virally infected and neoplastic cells. While substrates for both types of proteases have been determined experimentally, there are many more yet to be discovered in humans and other metazoans. Here, we present a bioinformatics method based on support vector machine (SVM) learning that identifies sequence and structural features important for protease recognition of substrate peptides and then
more » ... these features to predict novel substrates. Our approach can act as a convenient hypothesis generator, guiding future experiments by high confidence identification of peptide protein partners. Results: The method is benchmarked on the known substrates of both protease types, including our literature-curated GrB substrate set (GrBah). On these benchmark sets, the method outperforms a number of other methods that consider sequence only, predicting at a 0.87 true positive rate and a 0.13 false positive rate for caspase substrates, and a 0.79 true positive rate and a 0.21 false positive rate for GrB substrates. The method is then applied to ~25,000 proteins in the human proteome to generate a ranked list of predicted substrates of each protease type. Two of these predictions, AIF-1 and SMN1, were selected for further experimental analysis, and each was validated as a GrB substrate. Availability: All predictions for both protease types are publically available at http://modbase.compbio.ucsf.edu/peptide/. A web server is at the same site that allows a user to train new SVM models to make predictions for any protein that recognizes specific oligopeptide ligands.
doi:10.1093/bioinformatics/btq267 pmid:20505003 pmcid:PMC2894511 fatcat:3ji5tts7ajhhzoxg5lyas2nsl4