12MAP: Cloud Disaster Recovery Based on Image-Instance Mapping [chapter]

Shripad Nadgowda, Praveen Jayachandran, Akshat Verma
2013 Lecture Notes in Computer Science  
Virtual machines (VMs) in a cloud use standardized 'golden master' images, standard software catalog and management tools. This facilitates quick provisioning of VMs and helps reduce the cost of managing the cloud by reducing the need for specialized software skills. However, knowledge of this similarity is lost post-provisioning, as VMs could experience different changes and may drift away from one another. In this work, we propose the I2Map system, which maintains a mapping between each
more » ... ce and the golden master image from which it was created, consisting of a record of all changes to the instance since provisioning. We motivate that this mapping can aid several cloud management activities such as disaster recovery, system administration, and troubleshooting. We build a host-based disaster recovery solution based on I2Map, which is ideally suited for low cost cloud VMs that do not have access to dedicated block-based storage recovery solutions. Our solution deduplicates changes across VMs and needs to replicate only the unique changes, significantly reducing replication traffic on end hosts. We demonstrate that I2Map is able to deliver on tight recovery time and recovery point objectives of the order of minutes with low overhead. Compared to state-of-the-art host-based recovery solutions, I2Map is able to save 50-87% network bandwidth on the primary data center. D. Eyers and K. Schwan (Eds.): Middleware 2013, LNCS 8275, pp. 204-225, 2013. c IFIP International Federation for Information Processing 2013 I2Map: Cloud Disaster Recovery Based on Image-Instance Mapping 205 Host-based solutions [4, 16] are cheaper and less complex than storage-based or network-based solutions as they can be implemented completely in software. They do not require any specialized hardware. They are usually file-based and asynchronous, and work by trapping and forwarding write changes to the replication target. Their overheads and performance are also typically worse than the other two approaches. The core idea of this work is to leverage the similarity of virtual machines (VMs) in a data center to provide a low-cost host-based disaster recovery solution. A few standardized 'golden master' images are used to provision VMs in a cloud, to ensure quick provisioning and to reduce management costs. Hence, VMs which are provisioned from the same golden master tend to be similar to one another. However, knowledge of this similarity is lost post-provisioning as instances could be used for different purposes and may drift away from one another. We build I2Map, which maintains a record of all changes to an instance, as a mapping between the instance and the golden master image from which it was provisioned. A light-weight agent running on each VM records all changes and transmits them to a set of aggregators. The aggregators deduplicate these changes across VMs, store only the unique changes, and maintain the mapping for each VM. A snapshot-mirroring technique can then be applied to backup the aggregators on to a remote site. This recovery process allows us to trade-off recovery performance for cost. We evaluate I2Map on representative activities such as installing new software, patching the operating system, and running hadoop-based applications. We demonstrate that individual VMs can receive good recovery performance of a few minutes without having to invest in dedicated and specialized hardware. We conduct a 24-hour highload case study experiment where we recover a failed VM within a recovery time of 20 minutes and having a recovery point of less than 4 minutes. We show that I2Map uses 50-87% lesser network bandwidth on the primary data center compared to the state-ofthe-art host-based recovery solutions. The image-instance mapping can potentially be used for other applications such as system administration or troubleshooting failures. We discuss these as part of future work in Section 7. For the rest of this paper, we focus on the disaster recovery solution. The rest of this paper is organized as follows. We provide some background and motivate our problem and solution in Section 2. We present the design of I2Map in Section 3. Section 4 describes our implementation of I2Map and certain optimizations we performed. We evaluate I2Map and report the results in Section 5. Section 6 discusses related work, and Section 7 highlights the limitations and other potential applications of I2Map. We finally conclude this paper in Section 8.
doi:10.1007/978-3-642-45065-5_11 fatcat:ax4v7vx6w5ecbimrdbfdyv3jyi