PIRSF Family Classification System for Protein Functional and Evolutionary Analysis

Anastasia N. Nikolskaya, Cecilia N. Arighi, Hongzhan Huang, Winona C. Barker, Cathy H. Wu
2006 Evolutionary Bioinformatics  
The PIRSF protein classification system (http://pir.georgetown.edu/pirsf/) reflects evolutionary relationships of full-length proteins and domains. The primary PIRSF classification unit is the homeomorphic family, whose members are both homologous (evolved from a common ancestor) and homeomorphic (sharing full-length sequence similarity and a common domain architecture). PIRSF families are curated systematically based on literature review and integrative sequence and functional analysis,
more » ... ng sequence and structure similarity, domain architecture, functional association, genome context, and phyletic pattern. The results of classification and expert annotation are summarized in PIRSF family reports with graphical viewers for taxonomic distribution, domain architecture, family hierarchy, and multiple alignment and phylogenetic tree. The PIRSF system provides a comprehensive resource for bioinformatics analysis and comparative studies of protein function and evolution. Domain or fold-based searches allow identification of evolutionarily related protein families sharing domains or structural folds. Functional convergence and functional divergence are revealed by the relationships between protein classification and curated family functions. The taxonomic distribution allows the identification of lineage-specific or broadly conserved protein families and can reveal horizontal gene transfer. Here we demonstrate, with illustrative examples, how to use the web-based PIRSF system as a tool for functional and evolutionary studies of protein families. Classification of proteins is widely accepted to provide valuable clues to structure, function and evolution. Protein family classification has several advantages as a basic approach for large-scale annotation: (i) it improves the annotation of proteins that are difficult to characterize based on pair-wise alignments; (ii) it assists database maintenance by promoting family-based propagation of annotation and making annotation errors apparent; (iii) it provides an effective means to retrieve relevant biological information from vast amounts of data; and (iv) it reflects the underlying gene families, the analysis of which is essential for comparative genomics and phylogenetics. To facilitate accurate, consistent and rich functional annotation of proteins, the Protein Information Resource (PIR, http://pir.georgetown.edu/) employs a classification-driven annotation method supported by a bioinformatics framework that provides data integration and associative analysis. This paper describes the PIRSF family classification and functional annotation approaches and illustrates how manually curated protein families can be used to support protein functional and evolutionary studies via the PIRSF web site at
doi:10.1177/117693430600200033 fatcat:diybosyvjng3phlhk3bsd3oh7m