SoftSearch: Integration of Multiple Sequence Features to Identify Breakpoints of Structural Variations

Steven N. Hart, Vivekananda Sarangi, Raymond Moore, Saurabh Baheti, Jaysheel D. Bhavsar, Fergus J. Couch, Jean-Pierre A. Kocher, Haixu Tang
2013 PLoS ONE  
Structural variation (SV) represents a significant, yet poorly understood contribution to an individual's genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a community standard. In an attempt to fulfil this need, we developed an algorithm, SoftSearch, for discovering structural variant breakpoints in Illumina paired-end nextgeneration sequencing data. SoftSearch combines multiple
more » ... strategies for detecting SV including split-read, discordant read-pair, and unmated pairs. Co-localized split-reads and discordant read pairs are used to refine the breakpoints. Results: We developed and validated SoftSearch using real and synthetic datasets. SoftSearch's key features are 1) not requiring secondary (or exhaustive primary) alignment, 2) portability into established sequencing workflows, and 3) is applicable to any DNA-sequencing experiment (e.g. whole genome, exome, custom capture, etc.). SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call. Conclusions: We show that SoftSearch can identify more true SVs by combining multiple sequence features. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance.
doi:10.1371/journal.pone.0083356 pmid:24358278 pmcid:PMC3865185 fatcat:ifhtezbu25fvvac22zmmdrzzpq