New synthetic-diploid benchmark for accurate variant calling evaluation [article]

Heng Li, Jonathan M Bloom, Yossi Farjoun, Mark Fleharty, Laura D Gauthier, Benjamin Neale, Daniel MacArthur
2017 bioRxiv   pre-print
Constructed from the consensus of multiple variant callers based on short-read data, existing benchmark datasets for evaluating variant calling accuracy are biased toward easy regions accessible by known algorithms. We derived a new benchmark dataset from the de novo PacBio assemblies of two human cell lines that are homozygous across the whole genome. This benchmark provides a more accurate and less biased estimate of the error rate of small variant calls in a realistic context.
doi:10.1101/223297 fatcat:gc2bgedjrranxisp4jhld3wym4