Performance Implications from Sizing a VM on Multi-core Systems: A Data Analytic Application's View

Seung-Hwan Lim, James Horey, Yanjun Yao, Edmon Begoli, Qing Cao
2013 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum  
In this paper, we present a quantitative performance analysis of data analytics applications running on multicore virtual machines. Such environments form the core of cloud computing. In addition, data analytics applications, such as Cassandra and Hadoop, are becoming increasingly popular on cloud computing platforms. This convergence necessitates a better understanding of the performance and cost implications of such hybrid systems. For example, the very first step in hosting applications in
more » ... rtualized environments, requires the user to configure the number of virtual processors and the size of memory. To understand performance implications of this step, we benchmarked three Yahoo Cloud Serving Benchmark (YCSB) workloads in a virtualized multi-core environment. Our measurements indicate that the performance of Cassandra for YCSB workloads does not heavily depend on the processing capacity of a system, while the size of the data set is critical to performance relative to allocated memory. We also identified a strong relationship between the running time of workloads and various hardware events (last level cache loads, misses, and CPU migrations). From this analysis, we provide several suggestions to improve the performance of data analytics applications running on cloud computing environments.
doi:10.1109/ipdpsw.2013.97 dblp:conf/ipps/LimHYBC13 fatcat:pfqsgi43lbeitgn23d4reguasq