RNA CaptureSeq: targeted sequencing for comprehensive transcriptome studies [thesis]

Marion Brunck
Although only a small proportion of the human genome encodes proteins, most of our DNA is transcribed into a myriad of different RNA species, collectively termed the transcriptome. Investigation of this pervasive transcription has unveiled unanticipated context-specific regulation and biological significance for some of these RNA transcripts. Therefore, the common perception of the transcriptome has evolved, from RNA being a simple mediator between the genetic code and proteins, to RNA being a
more » ... ultifaceted complex genetic element. The human genome is now regarded as a sophisticated combination of intertwined protein-coding and non-coding sequences, generating a transcriptome much more versatile and heterogeneous than previously envisioned. The transcriptome exhibits a wide dynamic range, with a few genes generating the majority of transcripts within a cell, whilst the majority of genes are expressed at low levels, and have contextrestricted expression. Investigating these rare transcripts has been hampered by inconsistent and limited coverage using standard next-generation sequencing. This thesis investigates the potential of using RNA CaptureSeq to expand the breadth of transcriptomic studies by reaching saturating coverage of an enriched selected fraction of the transcriptome. The RNA CaptureSeq protocol was optimised and is comprehensively described. In addition to the detailed laboratory procedure and the suggested method for analysing generated datasets, experimental design considerations, anticipated results and troubleshooting approaches are presented. Co-sequencing ERCC RNA Spike-In transcripts added to samples during processing acts as an internal control to validate coverage of transcripts of multiple lengths and concentrations. The power of RNA CaptureSeq was exploited to focus sequencing coverage towards transcripts originating from human chromosome 21 (Hsa 21) in the human myelogenous leukemia cell line K562, and in 3 primary tissues. The ERCC Spike-In RNAs of the lowest concentrations (10 -22 mole/L) were sequenced with a sufficient number of reads to enable accurate assembly. RNA CaptureSeq revealed extensive transcription of Hsa 21, with the vast majority of sequenced reads mapping to entirely novel introns and exons. The dataset includes a large fraction of novel isoforms and entirely novel transcripts. These RNAs derive from regions previously considered intergenic, or spanning multiple annotated genes, and the complete dataset overall surpasses the existing iii GENCODE annotations. Analysis of transcription in brain, kidney and testis revealed a great prevalence of tissue-restricted transcription, especially for non-coding transcripts. RNA CaptureSeq was also applied to matching mouse tissues for syntenic regions to Hsa 21. This study expanded the catalogue of transcription in the mouse by at least 2.5-fold. While the number and density of genes was similar to human, the number of introns and lncRNAs were lower, which is in agreement with current theories for speciation. Furthermore, the newly annotated coding and non-coding genes showed a similar degree of evolutionary conservation to currently annotated sequences. In addition, the impact of cis-encoded regulation was questioned by probing Hsa 21 transcription in the mouse nuclear environment. RNA CaptureSeq was applied using Hsa 21specific probes, to organs from the aneuploid mouse Tc1, which contains an additional copy of human chromome 21 in its nuclei. The resulting dataset demonstrated the majority of human transcripts were present in Tc1 brain, kidney and testis nuclei. Therefore, the regulation of the expanded transcriptome revealed by CaptureSeq is mediated by local cis-encoded regulatory elements. This precise regulation of novel noncoding RNAs is inconsistent with previous assumptions of noncoding transcripts being spurious transcriptional noise. Altogether these studies suggest that the transcriptome is vastly greater even than inferred from ENCODE. RNA CaptureSeq exposes the currently unappreciated chromosome 21 transcriptome, as the large majority of sequenced reads map to entirely novel introns and exons. In addition to increasing the breadth of sequencing, RNA CaptureSeq also has the potential to resolve rare splice variants. This capacity was explored in human hematopoietic cancer tissues, using probes targeting intron-exon boundaries and branchpoint sites of cancer-related genes. This experiment was initially designed to investigate aberrant splicing events in tumours exhibiting a mutation in the SF3B1 splicing factor, but is proposed to have a larger range of applications, such as evidencing breakpoint translocations in fusion genes. Collectively, this work demonstrates the huge potential and versatility of the RNA CaptureSeq method to expand the breadth of the analysed transcriptome to saturation. Exposing the full catalogue of transcripts originating from a defined genomic area provides the opportunity for more accurate and precise hypothesis-testing, and for developing conceptual advances in various fields of research including regulation of gene expression and oncogenesis. iv DECLARATION BY AUTHOR This thesis is composed of my original work, and contains no material previously published or written by another person except where due reference has been made in the text. I have clearly stated the contribution by others to jointly-authored works that I have included in my thesis. I have clearly stated the contribution of others to my thesis as a whole, including statistical assistance, survey design, data analysis, significant technical procedures, professional editorial advice, and any other original research work used or reported in my thesis. The content of my thesis is the result of work I have carried out since the commencement of my research higher degree candidature and does not include a substantial part of work that has been submitted to qualify for the award of any other degree or diploma in any university or other tertiary institution. I have clearly stated which parts of my thesis, if any, have been submitted to qualify for another award. I acknowledge that an electronic copy of my thesis must be lodged with the University Library and, subject to the General Award Rules of The University of Queensland, immediately made available for research and study in accordance with the Copyright Act 1968.
doi:10.14264/uql.2015.692 fatcat:zob5iooaqzad3c7lcgndpxzssu