Variant pathogenic prediction by locus variability, the importance of the last picture of evolution [article]

Jose Luis Cabrera, Jose Antonio Enriquez, Jorge Garcia, Fatima Sanchez-Cabo
2020 bioRxiv   pre-print
Accurate pathogenic detection for single nucleotide variants (SNVs) is a key problem to perform variant ranking in whole exome sequencing studies. Several in silico tools have been developed to identify deleterious variants. Locus variability, computed as Shannon entropy from gnomAD/helixMTdb variant allele frequencies can be used as pathogenic variants predictor. In this study we evaluate the use of Shannon entropy in non-coding mitochondrial DNA and also in coding regions with an additional
more » ... lective pressure other than that imposed by the genetic code, as are splice-sites. To benchmark this functionality in non-coding mitochondrial variants, Shannon entropy was compared with HmtVar disease score, outperforming it in non-coding SNVs (AUCH=0.99 in ROC curve and PR-AUCH=1.00 in Precision-recall curve). In the same way, for splice-sites' variants, Shannon entropy was compared against two state-of-the-art ensemble predictors ada score and rf score, matching their overall performance both in ROC curves (AUCH=0.95) and Precision-recall curves (PR-AUC=0.97). Therefore, locus variability could aid in variant ranking process for these specific types of SNVs.
doi:10.1101/2020.11.06.371195 fatcat:3un7ybvui5dlxjxnykk2wdmbji