A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is application/pdf
.
SMARTH: Enabling Multi-pipeline Data Transfer in HDFS
2014
2014 43rd International Conference on Parallel Processing
Hadoop is a popular open-source implementation of the MapReduce programming model to handle large data sets, and HDFS is one of Hadoop's most commonly used distributed file systems. Surprisingly, we found that HDFS is inefficient when handling upload of data files from client local file system, especially when the storage cluster is configured to use replicas. The root cause is HDFS's synchronous pipeline design. In this paper, we introduce an improved HDFS design called SMARTH. It utilizes
doi:10.1109/icpp.2014.12
dblp:conf/icpp/ZhangWH14
fatcat:vx2g4cvncbc7tarolsd7vy4iv4