Fast and Efficient Compression of Floating-Point Data
IEEE Transactions on Visualization and Computer Graphics
Large scale scientific simulation codes typically run on a cluster of CPUs that write/read time steps to/from a single file system. As data sets are constantly growing in size, this increasingly leads to I/O bottlenecks. When the rate at which data is produced exceeds the available I/O bandwidth, the simulation stalls and the CPUs are idle. Data compression can alleviate this problem by using some CPU cycles to reduce the amount of data needed to be transfered. Most compression schemes,
... are designed to operate offline and seek to maximize compression, not throughput. Furthermore, they often require quantizing floating-point values onto a uniform integer grid, which disqualifies their use in applications where exact values must be retained. We propose a simple scheme for lossless, online compression of floating-point data that transparently integrates into the I/O of many applications. A plug-in scheme for data-dependent prediction makes our scheme applicable to a wide variety of data used in visualization, such as unstructured meshes, point sets, images, and voxel grids. We achieve state-of-the-art compression rates and speeds, the latter in part due to an improved entropy coder. We demonstrate that this significantly accelerates I/O throughput in real simulation runs. Unlike previous schemes, our method also adapts well to variable-precision floating-point and integer data. Index Terms-High throughput, lossless compression, file compaction for I/O efficiency, fast entropy coding, range coder, predictive coding, large scale simulation and visualization.