Alpha diversity metrics for noisy OTUs [article]

Robert C. Edgar, Henrik Flyvbjerg
2018 biorxiv/medrxiv   pre-print
AbstractNext-generation sequencing (NGS) of marker genes such as 16S ribosomal RNA is widely used to survey microbial communities. The in-sample (alpha) diversity of Operational Taxonomic Units (OTUs) is often summarized by metrics such as richness or entropy which are calculated from observed abundances, or by estimators such as Chao1 which extrapolate to unobserved OTUs. Most such measures are adopted from traditional biodiversity studies, where observational error can often be neglected.
more » ... ver, errors introduced by next-generation amplicon sequencing tend to induce spurious OTUs and spurious counts in OTU tables, both of which are especially prevalent at low abundances. In consequence, traditional metrics may be grossly inaccurate if they are naively applied to NGS OTU tables. In this work, we describe two novel alpha diversity estimators which are calculated from OTU abundances above a specified threshold. The singleton-free estimator (SFE) is a non-parametric estimator which is derived from a similar approach to Chao1 but extrapolates using doublet and triplet abundances rather than singletons and doublets. The octave estimator (OE) fits a log-normal distribution to non-singleton bars of an octave plot. We show that these estimators are effective under suitable conditions, but these conditions rarely apply in practice. We conclude that extrapolating to unobserved OTUs remains an open problem which is unlikely to be solved in the near future.
doi:10.1101/434977 fatcat:zieyuqe6tzh35hu5scqrshcakq