Backup or Not: An Online Cost Optimal Algorithm for Data Analysis Jobs Using Spot Instances

Liduo Lin, Li Pan, Shijun Liu
2020 IEEE Access  
Recently, large-scale public cloud providers begin to offer spot instances. This type of instance has become popular with more and more cloud users in the light of its convenient access mode and low price, especially for those big data analysis jobs with high performance computation requirements. However, using spot instances may carry the risk of being interrupted and lead to extra costs for job reexecutions because these instances are generally unstable. Yet, such cost can be greatly reduced
more » ... f a backup can be made at the right time before interruptions. For convenience and cost efficiency, users can choose the StaaS (Storage-as-a-Service) storage provided by the same cloud provider, whose spot instances are used by the users, to store backup data files for future job execution recovery. Since making backups too often will incur increased costs, users need to make the backup decisions appropriately considering the condition when an abrupt interruption will occur in the future. However, it is hard to know or predict precisely when such an interruption will occur. For solving this problem, in this paper, we propose an online algorithm to guide cloud users to make backups when using spot instances to execute big data analysis jobs, without requiring any information about future interruptions. We prove theoretically that our proposed online algorithm can guarantee a bounded competitive ratio less than 2. Finally, according to extensive experiments, we verify the effectiveness of our online algorithm in reducing the additional cost caused by interruptions in using spot instances and find that our online algorithm can still achieve a stable cost optimization even if interruptions occur frequently. INDEX TERMS spot instance, online algorithm, back up, abrupt termination VOLUME 4, 2016
doi:10.1109/access.2020.3014978 fatcat:36rrpvi6dzbmphks5ifoqvemyq