Assessing performance of pathogenicity predictors using clinically-relevant variant datasets [article]

Adam C Gunning, Verity Fryer, James Fasham, Andrew H Crosby, Sian Ellard, Emma Baple, Caroline F Wright
2020 biorxiv/medrxiv   pre-print
Purpose: Pathogenicity predictors are an integral part of genomic variant interpretation but, despite their widespread usage, an independent validation of performance using a clinically-relevant dataset has not been undertaken. Methods: We derive two validation datasets: an "open" dataset containing variants extracted from publicly-available databases, similar to those commonly applied in previous benchmarking exercises, and a "clinically-representative" dataset containing variants identified
more » ... rough research/diagnostic exome and diagnostic panel sequencing. Using these datasets, we evaluate the performance of three recently developed meta-predictors, REVEL, GAVIN and ClinPred, and compare their performance against two commonly used in silico tools, SIFT and PolyPhen-2. Results: Although the newer meta-predictors outperform the older tools, the performance of all pathogenicity predictors is substantially lower in the clinically-representative dataset. Using our clinically-relevant dataset, REVEL performed best with an area under the ROC of 0.81. Using a concordance-based approach based on a consensus of multiple tools reduces the performance due to both discordance between tools and false concordance where tools make common misclassification. Analysis of tool feature usage may give an insight into the tool performance and misclassification. Conclusion: Our results support the adoption of meta-predictors over traditional in silico tools, but do not support a consensus-based approach as recommended by current variant classification guidelines.
doi:10.1101/2020.02.06.937169 fatcat:pmq2xqmsu5caha4wuantc3lvsa