Performance evaluation of containers and virtual machines when running Cassandra workload concurrently

Sogand Shirinbab, Lars Lundberg, Emiliano Casalicchio
2020 Concurrency and Computation  
NoSQL distributed databases are often used as Big Data platforms. To provide efficient resource sharing and cost effectiveness, such distributed databases typically run concurrently on a virtualized infrastructure that could be implemented using hypervisor-based virtualization or container-based virtualization. Hypervisor-based virtualization is a mature technology but imposes overhead on CPU, networking, and disk. Recently, by sharing the operating system resources and simplifying the
more » ... t of applications, container-based virtualization is getting more popular. This article presents a performance comparison between multiple instances of VMware VMs and Docker containers running concurrently. Our workload models a real-world Big Data Apache Cassandra application from Ericsson. As a baseline, we evaluated the performance of Cassandra when running on the nonvirtualized physical infrastructure. Our study shows that Docker has lower overhead compared with VMware; the performance on the container-based infrastructure was as good as on the nonvirtualized. Our performance evaluations also show that running multiple instances of a Cassandra database concurrently affected the performance of read and write operations differently; for both VMware and Docker, the maximum number of read operations was reduced when we ran several instances concurrently, whereas the maximum number of write operations increased when we ran instances concurrently. K E Y W O R D S Cassandra, cloud computing, containers, performance evaluation, virtual machine INTRODUCTION Hypervisor-based virtualization began several decades ago and since then it has been widely used in cloud computing. Hypervisors, also called virtual machine monitors, share the hardware resources of a physical machine between multiple virtual machines (VMs). By virtualizing system resources such as CPUs, memory, and interrupts, it became possible to run multiple operating systems (OS) concurrently. The most commonly used hypervisors are kernel virtual machine (KVM), Xen Server, VMware, and Hyper-V. Hypervisor-based virtualization enables new features such as performance management, elastic resource scaling, and reliability services without requiring modifications to applications or operating systems. It also enables VM migration for load balancing and for consolidation to improve resource utilization and energy efficiency. However, hypervisor-level virtualization introduces performance overheads. 1-5 This overhead limits the use of hypervisor-level virtualization in performance critical domains. [6] [7] [8] This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
doi:10.1002/cpe.5693 fatcat:lrnuq72ra5h45pkt66dlembhwq