ARCHIE: Data Analysis Acceleration with Array Caching in Hierarchical Storage

Bin Dong, Teng Wang, Houjun Tang, Quincey Koziol, Kesheng Wu, Suren Byna
<span title="">2018</span> <i title="IEEE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/faqqmambavbalpofpx3p6nntua" style="color: black;">2018 IEEE International Conference on Big Data (Big Data)</a> </i> &nbsp;
Scientific data analysis typically involves reading massive amounts of data generated by simulations, experiments, and observations. A significant bottleneck in this process is reading such data because the data files are stored on the rotating disks. Recent supercomputing systems are adding non-volatile storage layers to fill the performance gap between fast main memory and the slow disk-based storage. Software libraries for managing this hierarchy not only need to read data efficiently, but
more &raquo; ... so reduce user-involvement for cross-layer data movement. As the scientific data is usually organized as arrays, these libraries also need to support array data access patterns over hierarchical storage systems. Existing software tools manage individual storage layers separately, and require significant manual work to move data among the layers. In this paper, we introduce a new array caching in hierarchical storage (ARCHIE) to accelerate array data analyses in a seamless fashion. ARCHIE evaluates array access patterns and prefetches data with array semantics between storage layers. On a production supercomputing system, our evaluation shows that ARCHIE outperforms state-of-the-art file systems, i.e., Lustre and DataWarp, by up to 5.8× in accessing data by scientific analysis applications.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/bigdata.2018.8622616">doi:10.1109/bigdata.2018.8622616</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/bigdataconf/DongWTKWB18.html">dblp:conf/bigdataconf/DongWTKWB18</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/tqb65jrrhnenjpe7ahc2omgq74">fatcat:tqb65jrrhnenjpe7ahc2omgq74</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20190429092228/https://sdm.lbl.gov/pdc/pubs/201812-BigData-ARCHIE.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/78/75/78759077ec78b6263fd0a91481c4ecbeb26971dd.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/bigdata.2018.8622616"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>