Algorithms for large-scale genotyping microarrays

W.-m. Liu, X. Di, G. Yang, H. Matsuzaki, J. Huang, R. Mei, T. B. Ryder, T. A. Webster, S. Dong, G. Liu, K. W. Jones, G. C. Kennedy (+1 others)
2003 Bioinformatics  
Motivation: Analysis of many thousands of single nucleotide polymorphisms (SNPs) across whole genome is crucial to efficiently map disease genes and understanding susceptibility to diseases, drug efficacy and side effects for different populations and individuals. High density oligonucleotide microarrays provide the possibility for such analysis with reasonable cost. Such analysis requires accurate, reliable methods for feature extraction, classification, statistical modeling and filtering.
more » ... lts: We propose the modified partitioning around medoids as a classification method for relative allele signals. We use the average silhouette width, separation and other quantities as quality measures for genotyping classification. We form robust statistical models based on the classification results and use these models to make genotype calls and calculate quality measures of calls. We apply our algorithms to several different genotyping microarrays. We use reference types, informative Mendelian relationship in families, and leave-one-out cross validation to verify our results. The concordance rates with the single base extension reference types are 99.36% for the SNPs on autosomes and 99.64% for the SNPs on sex chromosomes. The concordance of the leaveone-out test is over 99.5% and is 99.9% higher for AA, AB and BB cells. We also provide a method to determine the gender of a sample based on the heterozygous call rate of SNPs on the X chromosome. See
doi:10.1093/bioinformatics/btg332 pmid:14668223 fatcat:zi6oaf7hjreq3mxb2ziac2f3ue