Guest Editors Introduction: Special Issue on Scientific Cloud Computing

Kate Keahey, Ioan Raicu, Kyle Chard, Bogdan Nicolae
2016 IEEE Transactions on Cloud Computing  
and Data-Driven Sciences have become the third and fourth pillar of scientific discovery in addition to experimental and theoretical sciences. Scientific Computing has transformed scientific discovery, enabling scientific breakthroughs through new kinds of experiments and simulations that would have been impossible only a decade ago. It is the key to solving grand challenges in many domains and providing breakthroughs in new knowledge by combining the lessons and approaches from multiple
more » ... cale distributed computing areas, including high performance computing (HPC), high throughput computing (HTC), many-task computing, and data-intensive computing. Todays Big Data problems are generating datasets that are increasing exponentially in both complexity and volume, making their analysis, archival, and sharing one of the grand challenges of the 21st century. Cloud computing has shown a great deal of promise as a scalable and cost-effective computing model for supporting scientific applications. Indeed, over the last several years adoption has been swift. Cloud computing offers elastic computing capacity, virtualized resources, and pay-as-yougo billing models, these capabilities enable scientists to outsource analyses, scale to large problem sizes, and to do so by paying only for the resources they used-rather than requiring large upfront investments. However, there are many inherent challenges on how to adapt the mixed techniques used by modern scientific computing to make best use of cloud computing infrastructures and vice-versa. This journal Special Issue on Scientific Cloud Computing in the IEEE Transaction on Cloud Computing provides an opportune forum for presenting new research, development, and deployment efforts to address conducting scientific analyses on cloud computing infrastructures. This is a timely special issue, as we are seeing rapid growth of scientific computing using clouds, and in many cases, a lack of technological advancements are limiting the efficiency of executing these applications on the cloud. The importance of this area is reflected by the strong participation in this special issue: receiving 41 submissions of which eight have been selected. The selected papers contribute important advances towards leveraging clouds for scientific applications. The contributions focus on a broad range of topics, including: performance modeling and optimization, data management, resource allocation and scheduling, elasticity, reconfiguration, cost prediction and optimization. Most papers revolve around general techniques and approaches that are agnostic of the applications, while two contributions demonstrate how domain-specific scientific applications can be migrated to the cloud. While clouds provide access to enormous on-demand computing resources, challenges arise when attempting to scale analyses automatically and efficiently, both from a performance and a cost perspective. Righi et al. propose AutoElastic, a transparent, Platform-as-a-Service level approach for elastically scaling HPC applications on clouds. This approach enables applications to scale without requiring user intervention or source code modification. Chen et al. present a complementary system, Ensembe, that is able to construct performance models for applications running in clouds. The resulting performance models can be used by systems like AutoElastic to optimizing provisioning and allocation. Scientific workflows are one of the most common methods for running scientific applications on clouds. As such, several of the papers in this special issue address challenges faced by workflow systems. Zhou et al. investigate the economic landscape of running scientific workflows on clouds and present a scheduling system and associated cost optimizations that are able to minimize expected cost given userspecified probabilistic deadlines. In response to the challenges of running workflows in the presence of failure. Chen et al. present a theoretical analysis of the impact of failures on the runtime performance of scientific workflows. They apply a general task failure modeling approach to estimate performance under failure and present three fault-tolerant task clustering strategies and a dynamic clustering strategy to improve performance and adjust the granularity of clusters based on failure rate. New data management and transfer approaches are needed to move data efficiently and reliably to and from the cloud and between cloud instances. In this space, Yildirim et al. focus on optimizing large data transfers composed of heterogenous file sizes in heterogeneous environments. Tudoran et. al propose OverFlow, a data management system that runs across geographically distributed sites and empowers large-scale scientific applications with a set of tools to monitor and perform low level manipulations of data (e.g., compression, de-duplication, geo-replication) that enable achieving a desired performance-cost trade-off. Finally, two articles address the challenges of migrating scientific applications to the cloud. First, Frattini et al.
doi:10.1109/tcc.2015.2505022 fatcat:ynlhzw4edvajlamylxmmly2oge