NuChart-II: The road to a fast and scalable tool for Hi-C data analysis

Fabio Tordini, Maurizio Drocco, Claudia Misale, Luciano Milanesi, Pietro Liò, Ivan Merelli, Massimo Torquati, Marco Aldinucci
2016 The international journal of high performance computing applications  
AperTO -Archivio Istituzionale Open Access dell'Università di Torino NuChart-II: The road to a fast and scalable tool for Hi-C data analysis / Tordini, F.; Drocco, M.; Misale, C.; Milanesi, L.; Lio, P.; Merelli, I.; Torquati, M.; Aldinucci, M. Abstract Recent advances in molecular biology and bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus of a cell. High-throughput molecular biology techniques provide a genome-wide
more » ... pture of the spatial organization of chromosomes at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout a genome. This important information is however hampered by the lack of biologists-friendly analysis and visualisation software: these disciplines are literally caught in a flood of data and are now facing many of the scale-out issues that High-Performance Computing (HPC) has been addressing for years. Data must be managed, analysed and integrated, with substantial requirements in speed (in terms of execution time), application scalability and data representation. In this work we present NuChart-II, an efficient and highly optimized tool for genomic data analysis that provides a gene-centric, graph-based representation of genomic information, and proposes an ex-post normalisation technique for Hi-C data. While designing NuChart-II we addressed several common issues in the parallelisation of memory bound algorithms for shared-memory systems.
doi:10.1177/1094342016668567 fatcat:yomvfrtdnzbend2upksavepjfa