Pyrobayes: an improved base caller for SNP discovery in pyrosequences

Aaron R Quinlan, Donald A Stewart, Michael P Strömberg, Gábor T Marth
2008 Nature Methods  
Supplementary figures and text: Supplementary Figure 1. The Pyrobayes base calling approach. Supplementary Figure 2. Estimation of data likelihoods with parent distributions. Supplementary Figure 3. Concordance of base errors in the Pyrobayes and the native 454 base calls. Supplementary Figure 4. Distribution of base quality scores within homopolymer runs. Supplementary Figure 5. Base quality accuracy for the 454 Life Sciences FLX model. Supplementary Methods SUPPLEMENTARY FIGURE 1. The
more » ... S base calling approach. Supplementary Figure 1. The PYROBAYES base calling approach. a, Data likelihoods: The distributions of measured nucleotide incorporation signal intensity values for various homopolymer lengths. b, Priors: The observed frequencies of homopolymer lengths in eight different organismal genome sequences. The theoretical expectation of exponential decay is also included. c, Bayesian posteriors: The posterior probabilities of homopolymer lengths 0 -5 are shown, calculated from the data likelihoods (panel a) and the prior probabilities (panel b). d, The most likely number of bases is shown for eight consecutive nucleotide tests. The base quality value assigned to each called base is the probability that the base in question was not an overcall. SUPPLEMENTARY FIGURE 2. Estimation of data likelihoods with parent distributions. Supplementary Figure 2. Estimation of data likelihoods with parent distributions. The observed distributions of the incorporation intensity signal for various homopolymer lengths are marked with black x-s: (a) Length = 1. (b) Length = 2 (c) Length = 3. (d) Length = 4. The approximating (non-central Student's t) distributions are also indicated in red.
doi:10.1038/nmeth.1172 pmid:18193056 fatcat:dmxyelmjozdtlab7xmxin3ftyi