Machine learning classifiers provide insight into the relationship between microbial communities and bacterial vaginosis

Daniel Beck, James A. Foster
2015 BioData Mining  
Bacterial vaginosis (BV) is a disease associated with the vagina microbiome. It is highly prevalent and is characterized by symptoms including odor, discharge and irritation. No single microbe has been found to cause BV. In this paper we use random forests and logistic regression classifiers to model the relationship between the microbial community and BV. We use subsets of the microbial community features in order to determine which features are important to the classification models. Results:
more » ... We find that models generated using logistic regression and random forests perform nearly identically and identify largely similar important features. Only a few features are necessary to obtain high BV classification accuracy. Additionally, there appears to be substantial redundancy between the microbial community features. Conclusions: These results are in contrast to a previous study in which the important features identified by the classifiers were dissimilar. This difference appears to be the result of using different feature importance measures. It is not clear whether machine learning classifiers are capturing patterns different from simple correlations.
doi:10.1186/s13040-015-0055-3 pmid:26294933 pmcid:PMC4542107 fatcat:w34bq4uokvgsnlcc4xxhlqx3hm