The SUPERFAMILY database in 2004: additions and improvements

M. Madera
2004 Nucleic Acids Research  
The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of pro®le Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classi®cation of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the
more » ... ss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible ®rst, to ®nd out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over-or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identi®ers; and multiple alignment of genomic, PDB and custom sequences.
doi:10.1093/nar/gkh117 pmid:14681402 pmcid:PMC308851 fatcat:ja7yohpxorgrbdq7nhm4xcf4qq