Single-molecule sequencing of long DNA molecules allows high contiguity de novo genome assembly for the fungus fly, Sciara coprophila [article]

John M Urban, Michael S Foulk, Jacob E Bliss, C. Michelle Coleman, Nanyan Lu, Reza Mazloom, Susan J Brown, Allan C Spradling, Susan A. Gerbi
2020 bioRxiv   pre-print
The lower Dipteran fungus fly, Sciara coprophila, has many unique biological features. For example, Sciara undergoes paternal chromosome elimination and maternal X chromosome nondisjunction during spermatogenesis, paternal X elimination during embryogenesis, intrachromosomal DNA amplification of DNA puff loci during larval development, and germline-limited chromosome elimination from all somatic cells. Paternal chromosome elimination in Sciara was the first observation of imprinting, though the
more » ... mechanism remains a mystery. Here, we present the first draft genome sequence for Sciara coprophila to take a large step forward in aiding these studies. We approached assembling the Sciara genome using multiple sequencing technologies: PacBio, Oxford Nanopore MinION, and Illumina. To find an optimal assembly using these datasets, we generated 44 Illumina assemblies using 7 short-read assemblers and 50 long-read assemblies of PacBio and MinION sequence data using 6 long-read assemblers. We ranked assemblies using a battery of reference-free metrics, and scaffolded a subset of the highest-ranking assemblies using BioNano Genomics optical maps. RNA-seq datasets from multiple life stages and both sexes facilitated genome annotation. Moreover, we anchored nearly half of the Sciara genome sequence into chromosomes. Finally, we used the signal level of both the PacBio and Oxford Nanopore data to explore the presence or absence of DNA modifications in the Sciara genome since DNA modifications may play a role in imprinting in Sciara, as they do in mammals. These data serve as the foundation for future research by the growing community studying the unique features of this emerging model system.
doi:10.1101/2020.02.24.963009 fatcat:wz2qy362cfg6dlp53mwbvehtgm