HySA: A Hybrid Structural variant Assembly approach using next generation and single-molecule sequencing technologies [article]

Xian Fan, Mark Chaisson, Luay Nakhleh, Ken Chen
2016 biorxiv/medrxiv   pre-print
Achieving complete, accurate and cost-effective assembly of human genome is of great importance for realizing the promises of precision medicine. The abundance of repeats and genetic variations in human genome and the limitations of existing sequencing technologies call for the development of novel assembly methods that could leverage the complementary strengths of multiple technologies. We propose a Hybrid Structural variant Assembly (HySA) approach that integrates sequencing reads from next
more » ... g reads from next generation sequencing (NGS) and single-molecule sequencing (SMS) technologies to accurately assemble and detect structural variations (SV) in human genome. By identifying homologous SV-containing reads from different technologies through a bipartite-graph-based clustering algorithm, our approach turns a whole genome assembly problem into a set of independent SV assembly problems, each of which can be effectively solved to enhance assembly of structurally altered regions in human genome. In testing our approach using data generated from a haploid hydatidiform mole genome (CHM1) and a diploid human genome (NA12878), we found that our approach substantially improved the detection of many types of SVs, particularly novel large insertions, small INDELs (10-50bp) and short tandem repeat expansions and contractions over existing approaches with a low false discovery rate. Our work highlights the strengths and limitations of current approaches and provides an effective solution for extending the power of existing sequencing technologies for SV discovery.
doi:10.1101/069815 fatcat:b3yhbfsesbhj7pavqnt6rwreey