Towards efficient resource provisioning in MapReduce
Journal of Parallel and Distributed Computing
h i g h l i g h t s • Overprovisioning could lead to significant waste of computing resources and energy. • Performance gain decreases quickly beyond the best trade-off point on elbow curve. • Our algorithm for optimal resource provisioning is better than any rules of thumbs. • Use dynamic job profiling with table of signatures to match optimal task resources. • Efficient task provisioning saves energy and resources for jobs in multi-tenancy. a b s t r a c t The paper presents a novel approach
... nd algorithm with mathematical formula for obtaining the exact optimal number of task resources for any workload running on Hadoop MapReduce. In the era of Big Data, energy efficiency has become an important issue for the ubiquitous Hadoop MapReduce framework. However, the question of what is the optimal number of tasks required for a job to get the most efficient performance from MapReduce still has no definite answer. Our algorithm for optimal resource provisioning allows users to identify the best trade-off point between performance and energy efficiency on the runtime elbow curve fitted from sampled executions on the target cluster for subsequent behavioral replication. Our verification and comparison show that the currently well-known rules of thumb for calculating the required number of reduce tasks are inaccurate and could lead to significant waste of computing resources and energy with no further improvement in execution time.