The FANTOM5 collection, a data series underpinning mammalian transcriptome atlases in diverse cell types

Hideya Kawaji, Takeya Kasukawa, Alistair Forrest, Piero Carninci, Yoshihide Hayashizaki
2017 Scientific Data  
The latest project from the FANTOM consortium, an international collaborative effort initiated by RIKEN, generated atlases of transcriptomes, in particular promoters, transcribed enhancers, and long-noncoding RNAs, across a diverse set of mammalian cell types. Here, we introduce the FANTOM5 collection, bringing together data descriptors, articles and analyses of FANTOM5 data published across the Nature Research journals. Associated data are openly available for reuse by all. Our genomes contain
more » ... the complete set of information necessary to specify our development from a single totipotent cell to a complex multicellular organism, composed of hundreds of specialized cell types able to respond to environmental changes. In each of these cell types, and their responding states, different sets of genes are expressed through transcription. Determining the transcriptome, including the set of genes expressed, is fundamental to understanding cellular identity, gene regulation and human disease. The FANTOM (Functional Annotation of Mammalian Genomes) project was launched to provide a comprehensive catalogue of transcripts encoded in mammalian genomes (http://fantom.gsc.riken.jp). With the full-length cDNA technology developed at RIKEN 1 , the first, second and third rounds of the FANTOM projects surveyed the mammalian transcriptome landscape by sequencing a large collection of full-length cDNAs. This improved our catalog of protein coding genes, but also revealed the new world of long non-protein coding RNAs 2-7 (a major novel class of genes that had been overlooked). The captrapper reaction, initially used to select full-length cDNAs, was later used to develop CAGE (Cap Analysis Gene Expression) that quantifies transcription starting sites (TSSs) at single base-pair resolution 8 . With this method, the FANTOM3 project globally mapped TSSs in the mouse genome. This helped classify mammalian promoters into broad-CpG and sharp-TATA associated promoter architectures 9 . Subsequently the FANTOM4 project used CAGE and predicted proximal promoter transcription factor binding motifs to decipher the transcriptional regulatory network of a myeloid leukemia cell line undergoing differentiation 10 . Additionally, the new CAGE data revealed that a large fraction of the transcriptome initiates from retrotransposon derived sequences, and these exhibit exquisite tissue specificity 6 . Most recently the FANTOM5 11-13 project aimed at comprehensive maps of transcription initiation activities across the most diverse collection of cell types studied to date. A focus on normal, primary cells differentiated FANTOM5 from previous transcriptome studies. Most other broad studies had focused on 1
doi:10.1038/sdata.2017.113 pmid:28850107 pmcid:PMC5574373 fatcat:5ooytyc2lndm3gvkx4kv6atvgy