Large-scale, classification-driven, rule-based functional annotation of proteins [chapter]

Darren A. Natale, C. R. Vinayaka, Cathy H. Wu
2005 Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics  
Experimentally-verified information on protein function lags far behind the rapid accumulation of protein sequences. The simple approach to propagating information from characterized proteins to unknown proteins-namely, by sequence similarity search against databases of individual proteins-may fail to produce accurate results, and typically is used to transfer only protein name information. A more accurate, consistent, and comprehensive approach for largescale automated annotation makes use of
more » ... rotein family classification-driven rules. Unannotated proteins that satisfy a set of conditions for a particular rule can be annotated with the information appropriate for that rule. The approach leads to facile, accurate prediction and functional inference for uncharacterized proteins, allows systematic detection of genome annotation errors, and provides sensible propagation and standardization of protein annotation, including positionspecific sequence features, protein names and synonyms, and Gene Ontology terms. Rule-based annotation will be discussed in the context of the PIRSF protein classification system, PIRNR Name Rule system, and the PIRSR Site Rule system.
doi:10.1002/047001153x.g403314 fatcat:lhwkwad26zcttpgw5be2h5eijq