Dynamics and associations of microbial community types across the human body
A primary goal of the Human Microbiome Project (HMP) was to provide a reference collection of 16S ribosomal RNA gene sequences collected from sites across the human body that would allow microbiologists to better associate changes in the microbiome with changes in health 1 . The HMP Consortium has reported the structure and function of the human microbiome in 300 healthy adults at 18 body sites from a single time point 2,3 . Using additional data collected over the course of 12-18 months, we
... d Dirichlet multinomial mixture models 4 to partition the data into community types for each body site and made three important observations. First, there were strong associations between whether individuals had been breastfed as an infant, their gender, and their level of education with their community types at several body sites. Second, although the specific taxonomic compositions of the oral and gut microbiomes were different, the community types observed at these sites were predictive of each other. Finally, over the course of the sampling period, the community types from sites within the oral cavity were the least stable, whereas those in the vagina and gut were the most stable. Our results demonstrate that even with the considerable intra-and interpersonal variation in the human microbiome, this variation can be partitioned into community types that are predictive of each other and are probably the result of life-history characteristics. Understanding the diversity of community types and the mechanisms that result in an individual having a particular type or changing types, will allow us to use their community types to assess disease risk and to personalize therapies. Building on previous analysis of a healthy cohort of 300 individuals, we analysed a 16S rRNA gene sequence data set from the HMP Consortium 2,3 . The final data release for this cohort provided 16S rRNA gene sequence data and clinical metadata (Extended Data Table 1) from two time points for each of 300 healthy individuals and from a third time point for 100 of the individuals at 15 body sites for men and 18 for women 5 ; the interval between samplings varied between 30 and 451 days (median 5 224 days). A significant difficulty in analysing microbiome data has been the considerable intra-and interpersonal variation in the composition of the human microbiome 3,6,7 . A recently proposed approach for overcoming this difficulty within the gastrointestinal tract has been the concept of enterotypes, or more generically, stool community types 4, 8, 9 . In this approach samples are clustered into bins based on their taxonomic similarity. Specific enterotypes have been associated with the amount of protein, fat and carbohydrates in one's diet, obesity, inflammatory bowel disease, and Crohn's disease 4,    . Others have found associations between specific vaginal community types and the sexually transmitted Trichomonas vaginalis, pH, and ethnicity 12-14 and associations between skin community types and psoriasis 15 . Using bacterial community structures collected from 18 body sites and up to three time points, we applied community typing analysis to understand better the factors that affect the structure of the microbiome and contribute to human health. Concern has been expressed regarding whether community types reflect partitioning of an abundance gradient or the presence of clusters of relative abundance profiles 8, 16 . Two general approaches have been developed to assign samples to community types: partitioning around the medoid (PAM) and Dirichlet multinomial mixture (DMM) models 4,8 . To compare these methods we first generated simulated communities where there were one or four community types. Analysis of the simulated communities indicated that the negative log model evidence metric used by the DMM-based approach was superior to the metrics used to assess clusters within the PAM-based approach (Supplementary Information). Next, we assigned the samples for each body site to community types using both methods. Calculation of the negative log model evidence demonstrated that the community types identified using DMM were superior to those identified using the PAM-based approach (Extended Data Table 2 and Extended Data Fig. 1) . Thus, our analysis of simulated data and the HMP data suggests that the community types represent clusters of relative abundance profiles. Using the DMM-based approach, we identified between two (anterior nares) and seven (tongue dorsum) community types per body site (see Source Data associated with Fig.