Recovering high-quality host genomes from gut metagenomic data through genotype imputation [article]

Sofia Marcos, Melanie Parejo, Andone Estonba, Antton Alberdi
2021 bioRxiv   pre-print
AbstractMetagenomic data sets of host-associated microbial communities often contain host DNA that is usually discarded because the amount of data is too low for accurate host genetic analyses. However, if a reference panel is available, genotype imputation can be employed to reconstruct host genotypes and maximise the use of such a priori useless data. We tested the performance of a two-step strategy to input genotypes from four types of reference panels, comprised of deeply sequenced chickens
more » ... to low-depth host genome (~2x coverage) data recovered from metagenomic samples of chicken intestines. The target chicken population was formed by two broiler breeds and the four reference panels employed were (i) an internal panel formed by population-specific individuals, (ii) an external panel created from a public database, (iii) a combined panel of the previous two, and (iv) a diverse panel including more distant populations. Imputation accuracy was high for all tested panels (concordance >0.90), although samples with coverage under 0.28x consistently showed the lowest accuracies. The best imputation performance was achieved by the combined panel due to the high number of imputed variants, including low-frequency ones. However, common population genetics parameters measured to characterise the chicken populations, including observed heterozygosity, nucleotide diversity, pairwise distances and kinship, were only minimally affected by panel choice, with all four panels yielding suitable results for host population characterization and comparison. Likewise, genome scans between the two studied broiler breeds using imputed data with each panel consistently identified the same sweep regions. In conclusion, we show that the applied imputation strategy enables leveraging insofar discarded host DNA to get insights into the genetic structure of host populations, and in doing so, facilitate the implementation of hologenomic approaches that jointly analyse host genomic and microbial metagenomic data.Author summaryWe introduce and assess a methodological approach that enables recovering animal genomes from complex mixtures of metagenomic data, and thus expand the portfolio of analyses that can be conducted from samples such as faeces and gut contents. Metagenomic data sets of host-associated microbial communities often contain DNA of the host organism. The principal drawback to use this data for host genomic characterisation is the low percentage and quality of the host DNA. In order to leverage this data, we propose a two-step imputation method, to recover high-density of variants. We tested the pipeline in a chicken metagenomic dataset, validated imputation accuracy statistics, and studied common population genetics parameters to assess how these are affected by genotype imputation and choice of reference panel. Being able to analyse both domains from the same data set could considerably reduce sampling and laboratory efforts and resources, thereby yielding more sustainable practices for future studies that embrace a hologenomic approach that jointly analyses animal genomic and microbial metagenomic features.
doi:10.1101/2021.10.25.465664 fatcat:n6ne6hgiurdxparqcipicpkcce