Immunogenetic sequence annotation based on IMGT-ONTOLOGY

Joumana Jabado-Michaloud, Marie-Paule Lefranc, Géraldine Folch, Fatena Bellahcene, François Ehrenmann, Patrice Duroux, Véronique Giudicelli
2009 Nature Precedings  
Information system® IMGT/LIGM-DB [1] is the first and the largest IMGT® database [2] in which are managed, analysed and annotated more than 136,000 immunoglobulin (IG) and T cell receptor (TR) nucleotide sequences from human and 235 other vertebrate species (April 2009). The expert annotation of these sequences and the added standardized knowledge are based on IMGT-ONTOLOGY, the first ontology developed in the field of immunogenetics and immunoinformatics [3]. The annotation of immunogenetic
more » ... uences requires important expertise, owing to the unusual structure (non-classical exon/intron structure) of the IG and TR genes and characteristic chain synthesis owing to DNA V-J and V-D-J rearrangements. The way to annotate these sequences depends on the molecular type (gDNA, mRNA, cDNA or protein) and the configuration type (germline or rearranged), and if sequences from the concerned species are present or not in the IMGT reference directory sets. IMGT/V-QUEST [5] and internal tools (IMGT/Automat, IMGT/LIGMotif, IMGT/BLAST and IMGT/DomainGapAlign) were developed. The first step in annotation allows to identify the chain type (for instance IG-Heavy) and to assign standardized keywords (IDENTIFICATION axiom). The second step is the classification of IG and TR genes and alleles (CLASSIFICATION axiom). The third step is the description (DESCRIPTION axiom) of the V, D, J and C genes and alleles with specific standardized labels. There are more than 590 IMGT standardized labels from which 64 have been entered in Sequence Ontology (SO). The delimitation of the FR-IMGT and CDR-IMGT lengths and the positions of conserved amino acids based on the IMGT unique numbering (NUMEROTATION axiom) allow to bridge the gap between sequences and 3D structures [6]. The complete annotation of immunogenetic germline (V, D, J) and C sequences is followed by the update of the IMGT Repertoire (IMGT Gene tables, Alignments of alleles, Protein displays, Colliers de Perles, etc.), IMGT® gene database (IMGT/GENE-DB) and IMGT reference directory sets of the IMGT® tools (IMGT/V-QUEST, IMGT/JunctionAnalysis and IMGT/DomainGapAlign). [1] Giudicelli, V. et al., Nucleic Acids Res., 34, D781-784 (2006). [2] Lefranc, M.-P. et al., Nucleic. Acids Res., 37, D1006-1012 (2009 The NUMEROTATION axiom and the concepts of numerotation determine the principles of a unique numbering for a domain (sequences and 3D structures). The "IMGT_unique_numbering" concept is illustrated by the "IMGT_Collier_de_Perles" concept which allows graphical representation in two dimensions (2D) of amino acid sequences of V type [1], C type [2] or G type [3] domains. [1] Lefranc, M.-P. et al., Dev. Comp. Immunol., 27, 55-77 (2003) [2] Lefranc, M.-P. et al., Dev. Comp. Immunol., 29, 185-203 (2005) [3]Lefranc, M.-P. et al., Dev. Comp. Immunol., 29, 917-938 (2005)
doi:10.1038/npre.2009.3165.1 fatcat:4dtmiolo45hsvhy5lwwdecgbyi