Algorithms for whole genome shotgun sequencing

Eric Anson, Gene Myers
1999 Proceedings of the third annual international conference on Computational molecular biology - RECOMB '99  
Essential to a whole-genome shotgun approach to DN.4 sequencing is the availability of pairs of short, unique sequence markers at a roughly estimated distance from each other. Determining the sequence of the genome can then be broken into a series of inter-marker assembly problems that determine the sequence between a pair of markers. Unfortunat.ely, marker pairs are not always correct and repeats can greatly confound the assembly. This motivates our first problem of rapidly finding a set of
more » ... finding a set of linked contigs, called a scaffold, between a pair of markers that confirms the marker pair and the traversability of the region between them. We then present an inter-marker assembly algorithm that determines the unique sequence segments between a marker pair. Both algorithms are evaluated with respect to a simulation that can model the clustering of repeats and for which our only information about the presence of repeats is excessive coverage and the ability to detect their boundaries. Simulation results show that at 10x coverage one can find and assemble the unique sequence between markers more than 99.9% of the time.
doi:10.1145/299432.299442 dblp:conf/recomb/AnsonM99 fatcat:f247btduizhhxghe3uexj5urtq