A FAMILY CLASSIFICATION APPROACH TO FUNCTIONAL ANNOTATION OF PROTEINS [chapter]

Cathy H. Wu, Winona C. Barker
2004 The Practical Bioinformatician  
The high-throughput genome projects have resulted in a rapid accumulation of genome sequences for a large number of organisms. To fully realize the value of the data, scientists need to identify proteins encoded by these genomes and understand how these proteins function in making up a living cell. With experimentally verified information on protein function lagging far behind, computational methods are needed for reliable and large-scale functional annotation of proteins. A general approach
more » ... functional characterization of unknown proteins is to infer protein functions based on sequence similarity to annotated proteins in sequence databases. While this is a powerful approach that has led to many scientific discoveries, accurate annotation often requires the use of a variety of algorithms and databases, coupled with manual curation. This complex and ambiguous process is inevitably error prone. ¾ Indeed, numerous genome annotation errors have been detected, ½¼ ¾¼¿ many of which have been propagated throughout other molecular databases. There are several sources of errors. Since many proteins are multifunctional, the assignment of a single function, which is still common in genome projects, results in incomplete or incorrect information. Errors also often occur when the best hit in pairwise sequence similarity searches is an uncharacterized or poorly annotated protein, is itself incorrectly predicted, or simply has a different function. The Protein Information Resource (PIR) ¼¿ provides an integrated public resource of protein informatics to support genomic and proteomic research and scientific discovery. PIR produces the Protein Sequence Database (PSD) of functionally annotated protein sequences, which grew out of the Atlas of Protein Sequence and Structure edited by Margaret Dayhoff. ½ ½ The annotation problems are addressed by a classification-driven and rule-based method with evidence at-417
doi:10.1142/9789812562340_0019 fatcat:l7agtzoktbailpq6y3z7oqckma