Evaluating storage systems for scientific data in the cloud

Ketan Maheshwari, Justin M. Wozniak, Hao Yang, Daniel S. Katz, Matei Ripeanu, Victor Zavala, Michael Wilde
2014 Proceedings of the 5th ACM workshop on Scientific cloud computing - ScienceCloud '14  
Infrastructure-as-a-Service (IaaS) clouds are an appealing resource for scientific computing. However, the bare-bones presentation of raw Linux virtual machines leaves much to the application developer. For many cloud applications, effective data handling is critical to efficient application execution. This paper investigates the capabilities of a variety of POSIX-accessible distributed storage systems to manage data access patterns resulting from workflow application executions in the cloud.
more » ... leverage the expressivity of the Swift parallel scripting framework to benchmark the performance of a number of storage systems using synthetic workloads and three real-world applications. We characterize two representative commercial storage systems (Amazon S3 and HDFS, respectively) and two emerging research-based storage systems (Chirp/Parrot and MosaStore). We find the use of aggregated node-local resources effective and economical compared with remotely located S3 storage. Our experiments show that applications run at scale with MosaStore show up to 30% improvement in makespan time compared with those run with S3. We also find that storage-system driven application deployments in the cloud results in better runtime performance compared with an on-demand datastaging driven approach.
doi:10.1145/2608029.2608034 dblp:conf/hpdc/MaheshwariWYKRZW14 fatcat:3rtqegs4rrbdxorejqj72hb76a