Protein Disorder Prediction

Rune Linding, Lars Juhl Jensen, Francesca Diella, Peer Bork, Toby J Gibson, Robert B Russell
2003 Structure  
a target protein are potentially disordered/unstructured. Computational tools to help discern ordered globular domains from disordered regions are key to such efforts. It is becoming increasingly clear that many functionally important protein segments occur outside of globu-Biocomputing Unit Meyerhofstr 1 lar domains (Wright and Dyson, 1999; Dunker et al., 2002). Protein structure and function space is parti-D-69117 Heidelberg Germany tioned in two subspaces. The first consist of globular units
more » ... with binding pockets, active sites, and interaction 2 Max-Delbrü ck-Centre fü r Molecular Medicine Robert-Rö ssle-Strasse 10 surfaces. The second subspace contains nonglobular segments such as sorting signals, posttranslational modi-D-13092 Berlin Germany fication sites, and protein ligands (e.g., SH3 ligands). Globular units are built of regular secondary structure 3 CellZome GmbH Meyerhofstr 1 elements and contribute the majority of the structural data deposited in PDB. In contrast, the nonglobular sub-D-69117 Heidelberg Germany space encompasses disordered, unstructured and flexible regions without regular secondary structure. Functional sites within the nonglobular space are known as linear motifs (cataloged by ELM [http://elm.eu.org]) Summary (Puntervoll et al., 2003). There are also many recent reports of Intrinsically A great challenge in the proteomics and structural genomics era is to predict protein structure and func-Disordered Proteins (IDPs, also known as Intrinsically Unstructured Proteins). These are proteins or domains tion, including identification of those proteins that are partially or wholly unstructured. Disordered regions in that, in their native state, are either completely disordered or contain large disordered regions. More than proteins often contain short linear peptide motifs (e.g., SH3 ligands and targeting signals) that are important 100 such proteins are known including Tau, Prions, Bcl-2, p53, 4E-BP1, and eIF1A (see Figure 4) (Tompa, for protein function. We present here DisEMBL, a computational tool for prediction of disordered/unstruc-2002; Uversky, 2002). Protein disorder is important for understanding pro-tured regions within a protein sequence. As no clear definition of disorder exists, we have developed pa-tein function as well as protein folding pathways (Plaxco and Gross, 2001; Verkhivker et al., 2003). Although little rameters based on several alternative definitions and introduced a new one based on the concept of "hot is understood about the cellular and structural meaning of IDPs, they are thought to become ordered only when loops," i.e., coils with high temperature factors. Avoiding potentially disordered segments in protein expression bound to another molecule (e.g., CREB-CBP complex [Radhakrishnan et al., 1997]) or owing to changes in constructs can increase expression, foldability, and stability of the expressed protein. DisEMBL is thus the biochemical environment (Dunker et al., 2001, 2002; Uversky, 2002). useful for target selection and the design of constructs as needed for many biochemical studies, particularly The current view on disorder is that disordered proteins are disordered to allow for more interaction part-structural biology and structural genomics projects. The tool is freely available via a web interface (http:// ners and modification sites (Wright and Dyson, 1999; Liu et al., 2002; Tompa, 2002). It has also been suggested dis.embl.de) and can be downloaded for use in largescale studies. that disordered proteins exist to provide a simple solution to having large intermolecular interfaces while keeping smaller protein, genome and cell sizes (Gunasekaran Introduction et al., 2003). It has been noted that having several relatively low-affinity linear interaction sites allows for a flex-In the post genomic era, discovery of novel domains ible, subtle regulation as well as account for specificity and functional sites in proteins is of growing importance. with fewer linear motifs types (Evans and Owen, 2002). One focus of structural genomics initiatives is to solve It has also been demonstrated that protein disorder structures for novel domains and thereby increase the plays a central role in biology and in diseases mediated coverage of fold and structure space (Brenner, 2000) . by protein misfolding and aggregation (Schweers et al., During the target selection process in structural geno-1994; Kaplan et al., 2003; Bates, 2003). mics/biology intrinsic protein disorder is important to No commonly agreed definition of protein disorder consider since disordered regions at the N and C termini exists. The thermodynamic definition of disorder in a (or even within domains) often leads to difficulties in polypeptide chain is the "random coil" structural state. protein expression, purification and crystallization. It is The random coil state can best be understood as the therefore essential to be able to predict which regions of structural ensemble spanned by a given polypeptide in which all degrees of freedom are used within the *Correspondence: linding@embl.de 4 These authors contributed equally to this work. conformational space. However, even under extremely Structure 1454 automated segmentation using complexity measures. Comput. C., Nielsen, H., Staerfeldt, H.H., Rapacki, K., Workman, C., et al. Chem. 18, 269-285. (2002). Prediction of human protein function from post-translational Wright, P., and Dyson, H. (1999). Intrinsically unstructured proteins: modifications and localization features. J. Mol. Biol. 319, 1257-1265. re-assessing the protein structure-function paradigm. J. Mol. Biol. Kabsch, W., and Sander, C. (1983). Dictionary of protein secondary 293, 321-331. structure: pattern recognition of hydrogen-bonded and geometrical Zoete, V., Michielin, O., and Karplus, M. (2002). Relation between features. Biopolymers 22, 2577-2637. sequence and structure of HIV-1 protease inhibitor complexes: a Kaplan, B., Ratner, V., and Haas, E. (2003). alpha-Synuclein: Its model system for the analysis of protein flexibility. J. Mol. Biol. 315, biological function and role in neurodegenerative diseases. J. Mol. . Glob-Plot: exploring protein sequences for globularity and disorder. Nucleic Acids Res. 31, 3701-3708.
doi:10.1016/j.str.2003.10.002 pmid:14604535 fatcat:wqoqz5cvgvfnthpyw7x6varca4