Improving I/O Throughput of Scientific Applications Using Transparent Parallel Compression
2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Increasing number of cores in parallel computer systems are allowing scientific simulations to be executed with increasing spatial and temporal granularity. However, this also implies that increasing larger-sized datasets need to be output, stored, managed, and then visualized and/or analyzed using a variety of methods. In examining the possibility of using compression to accelerate all of these steps, we focus on two important questions: "Can compression help save time when data is output
... or input into, a parallel program?", and "How can a scientist's effort in using compression with a parallel program be minimized?". We focus on PnetCDF, and show how transparent compression can be supported, thus allowing an existing simulation program to start outputting and storing data in a compressed fashion, and similarly, allow a data analysis application to read compressed data. We address challenges in supporting compression when parallel writes are being performed. In our experiments, we first analyze the effects of using compression with microbenchmarks, and then, continue our evaluation using a scientific simulation application, and two data analysis applications. While we obtain up to a factor of 2 improvement in performance for microbenchmarks, the execution time of simulation application is improved up to 22%, and the maximum speedup of data analysis applications is 1.83 (with an average speedup of 1.36).