Analysis of the Aspergillus nidulans transcriptome using high-throughput RNA sequencing

Christopher Sibthorp
2013
The filamentous fungus, Aspergillus nidulans is a well-characterized model organism which has been used extensively for the study of eukaryotic cell biology and genetics over the past 60 years. The A. nidulans genome was sequenced in 2005, and various genome annotations have been released since, the majority of which rely heavily on in silico gene prediction. The development of high-throughput next generation sequencing technologies has revolutionised transcriptomics by allowing RNA-analysis of
more » ... whole transcriptomes through massively parallel cDNA sequencing (RNA-seq). This sequencing approach has been applied to the A. nidulans transcriptome, and augmented by the development of a novel strategy for selectively sequencing the 5′ ends of RNAs on the ABI SOLiD platform. This aimed to produce a more robust resource for gene interrogation and the investigation of regulatory elements which impact on the transcriptomal landscape in A. nidulans. Bioinformatic analysis RNA-seq data was used to define 15,375 transcription start site (TSS) regions, which have been characterised by statistical analysis of mapped 5′ end distribution. Motif finding within sequence regions surrounding these TSS identified 16 putative functional promoter motifs based on overrepresentation and distributional analysis within promoters, and GO annotation found significant functional enrichment amongst genes associated with two of these motifs (AARARAAA and TTTYTTY). Transcript assembly of RNA-seq data has also revealed 16065 putative transcripts, 1112 of which were mapped to regions annotated as intergenic. From these transcripts we identified 38 strong candidates for novel protein coding genes (six of which contained non-canonical translation start sites), and over 400 additional transcripts containing putative coding regions. Separation of RNA-seq data in two sets of strand specific reads was shown to greatly increase the quality of transcript assembly and facilitated the identification of 2291 occurrences of sense:antisense overlap between assem [...]
doi:10.17638/00009973 fatcat:aeoxgx7pqbcfdo7jc7hotal2uu