The Genexpress Index: a resource for gene discovery and the genic map of the human genome

R Houlgatte, R Mariage-Samson, S Duprat, A Tessier, S Bentolila, B Lamy, C Auffray
1995 Genome Research  
Detailed analysis of a set of 18,698 sequences derived from both ends of 10,979 human skeletal muscle and brain cDNA clones defined 6676 functional families, characterized by their sequence signatures over 5750 distinct human gene transcripts. About half of these genes have been assigned to specific chromosomes utilizing 2733 eSTS markers, the polymerase chain reaction, and DNA from human-rodent somatic cell hybrids. Sequence and clone clustering and a functional classification together with
more » ... prehensive data base searches and annotations made it possible to develop extensive sequence and map cross-indexes, define electronic expression profiles, identify a new set of overlapping genes, and provide numerous new candidate genes for human pathologies. 272 ~I GENOME RESEARCH 5:272-304 ©1995 by Cold Spring Harbor Laboratory Press ISSN 1054-9803/95 $5.00 Cold Spring Harbor Laboratory Press on July 22, 2018 -Published by Downloaded from 9 Figure 1 Sequence clustering strategy. Partial sequences (1-9) are represented by arrows, below the mRNA from which they were derived. Broken lines indicate individual clones. Sequence analysis allowed us to find redundant sequences (>90% similar over their entire lengths) such as sequence 6 (identical to sequence 5) or 8 (identical to sequence 7). These redundant sequences were not subject to further analysis. All nonredundant sequences were compared with each other to find overlapping sequences (sequences detected by FASTA, with a Opt parameter >120, >90% identities, and validated by users) such as sequences 5 and 7, 2 and 3, 3 and 4. This allowed us to cluster sequences into contigs defined as sets of sequences linked either by redundancy or overlaps (sequences 1, 2-4, 5-8, 9) and to cluster together some sequences that could not be aligned (sequences 2 and 4) because of a lowquality segment or alternative splicing (sequence 4 II), if a third sequence (sequence 3) overlapped with them.
doi:10.1101/gr.5.3.272 pmid:8593614 fatcat:d4kursrtbrcgdjucne2msinf6u