Phantasy: Low-Latency Virtualization-based Fault Tolerance via Asynchronous Prefetching

Shiru Ren, Yunqi Zhang, Lichen Pan, Zhen Xiao
2018 IEEE transactions on computers  
Fault tolerance has become increasingly critical for virtualized systems as growing amount of mission-critical applications are now deployed on virtual machines rather than directly on physical machines. However, prior hardware-based fault-tolerant systems require extensive modification to existing hardware, which makes them infeasible for industry practitioners. Although software-based techniques realize fault tolerance without any hardware modification, they suffer from significant latency
more » ... rhead that is often orders of magnitude higher than acceptable. To realize practical low-latency fault tolerance in the virtualized environment, we first identify two bottlenecks in prior approaches, namely the overhead for tracking dirty pages in software and the long sequential dependency in checkpointing system states. To address these bottlenecks, we design a novel mechanism to asynchronously prefetch the dirty pages without disrupting the primary VM execution to shorten the sequential dependency. We then develop Phantasy, a system that leverages page-modification logging (PML) technology available on commodity processors to reduce the dirty page tracking overhead and asynchronously prefetches dirty pages through direct remote memory access via RDMA. Evaluated on 25 real-world applications, we demonstrate Phantasy can significantly reduce the performance overhead by 38% on average, and further reduce the latency by 85% compared to a state-of-the-art virtualization-based fault-tolerant system.
doi:10.1109/tc.2018.2865943 fatcat:ymp766plznca5eti32da43tfgi