Characterizing the Performance of Parallel Applications on Multi-socket Virtual Machines
2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
In this paper we characterize the behavior with respect to memory locality management of scientific computing applications running in virtualized environments. NUMA locality on current solutions (KVM and Xen) is enforced by pinning virtual machines to CPUs and providing NUMA aware allocation in hypervisors. Our analysis shows that due to two-level memory management and lack of integration with page reclamation mechanisms, applications running on warm VMs suffer from a "leakage" of page
... e" of page locality. Our results using MPI, UPC and OpenMP implementations of the NAS Parallel Benchmarks, running on Intel and AMD NUMA systems, indicate that applications observe an overall average performance degradation of 55% when compared to native. Runs on "cold" VMs suffer an average performance degradation of 27%, while subsequent runs are roughly 30% slower than the cold runs. We quantify the impact of locality improvement techniques designed for full virtualization environments: hypervisor level page remapping and partitioning the NUMA domains between multiple virtual machines. Our analysis shows that hypervisor only schemes have little or no potential for performance improvement. When the programming model allows it, system partitioning with proper VM and runtime support is able to re-produce native performance: in a partitioned system with one virtual machine per socket the average workload performance is 5% better than native.