The carbon footprint of bioinformatics [article]

Jason G Grealey, Loïc Lannelongue, Woeiyuh Saw, Jonathan Marten, Guillaume Méric, Sergio Ruiz Carmona, Michael Inouye
2021 bioRxiv   pre-print
Bioinformatic research relies on large-scale computational infrastructures which have a non-zero carbon footprint. So far, no study has quantified the environmental costs of bioinformatic tools and commonly run analyses. In this study, we estimate the bioinformatic carbon footprint (in kilograms of CO2 equivalent units, kgCO2e) using the freely available Green Algorithms calculator (www.green-algorithms.org). We assess (i) bioinformatic approaches in genome-wide association studies (GWAS), RNA
more » ... equencing, genome assembly, metagenomics, phylogenetics and molecular simulations, as well as (ii) computation strategies, such as parallelisation, CPU (central processing unit) vs GPU (graphics processing unit), cloud vs. local computing infrastructure and geography. In particular, for GWAS, we found that biobank-scale analyses emitted substantial kgCO2e and simple software upgrades could make GWAS greener, e.g. upgrading from BOLT-LMM v1 to v2.3 reduced carbon footprint by 73%. Switching from the average data centre to a more efficient data centres can reduce carbon footprint by ~34%. Memory over-allocation can be a substantial contributor to an algorithm's carbon footprint. The use of faster processors or greater parallelisation reduces run time but can lead to, sometimes substantially, greater carbon footprint. Finally, we provide guidance on how researchers can reduce power consumption and minimise kgCO2e. Overall, this work elucidates the carbon footprint of common analyses in bioinformatics and provides solutions which empower a move toward greener research.
doi:10.1101/2021.03.08.434372 fatcat:tdybp6qetrdfzo5mcswehirbd4