SCALER: Scalable parallel file write in HDFS

Xi Yang, Yanlong Yin, Hui Jin, Xian-He Sun
2014 2014 IEEE International Conference on Cluster Computing (CLUSTER)  
Two camps of file systems exist: parallel file systems designed for conventional high performance computing (HPC) and distributed file systems designed for newly emerged dataintensive applications. Addressing the big data challenge requires an approach that utilizes both high performance computing and data-intensive computing power. Thus, HPC applications may need to interact with distributed file systems, such as HDFS. The N-1 (N-to-1) parallel file write is a critical technical challenge,
more » ... use it is very common for HPC applications but HDFS does not allow it. This study introduces a system solution, named SCALER, which allows MPI based applications to directly access HDFS without extra data movement. SCALER supports N-1 file write at both the inter-block level and intra-block level. Experimental results confirm that SCALER achieves the design goal efficiently.
doi:10.1109/cluster.2014.6968736 dblp:conf/cluster/YangYJS14 fatcat:h3r7eawslzc45eo5alhdevm2ea