Configuring a MapReduce Framework for Performance-Heterogeneous Clusters

Jessica Hartog, Renan Delvalle, Madhusudhan Govindaraju, Michael J. Lewis
2014 2014 IEEE International Congress on Big Data  
When data centers employ the common and economical practice of upgrading subsets of nodes incrementally, rather than replacing or upgrading all nodes at once, they end up with clusters whose nodes have non-uniform processing capability, which we also call performance-heterogeneity. Popular frameworks supporting the effective MapReduce programming model for Big Data applications do not flexibly adapt to these environments. Instead, existing MapReduce frameworks, including Hadoop, typically
more » ... data evenly among worker nodes, thereby inducing the well-known problem of stragglers on slower nodes. Our alternative MapReduce framework, called MARLA, divides each worker's labor into sub-tasks, delays the binding of data to worker processes, and thereby enables applications to run faster in performance-heterogeneous environments. This approach does introduce overhead, however. We explore and characterize the opportunity for performance gains, and identify when the benefits outweigh the costs. Our results suggest that frameworks should support finer grained sub-tasking and dynamic data partitioning when running on some performance-heterogeneous clusters. Blindly taking this approach in homogeneous clusters can slow applications down. Our study further suggests the opportunity for cluster managers to build performance-heterogeneous clusters by design, if they also run MapReduce frameworks that can exploit them. 1
doi:10.1109/bigdata.congress.2014.26 dblp:conf/bigdata/HartogDGL14 fatcat:7xkyijf3sjhtvms5evdvbjfdty