Auspice: Automatic Service Planning in Cloud/Grid Environments [chapter]

David Chiu, Gagan Agrawal
2011 Grids, Clouds and Virtualization  
Scientific advancements have ushered in staggering amounts of available data and processes which are now scattered across various locations in the Web, Grid, and more recently, the Cloud. These processes and data sets are often semantically loosely-coupled and must be composed together piecemeal to generate scientific workflows. Understanding how to design, manage, and execute such data-intensive workflows has become increasingly esoteric, confined to a few scientific experts in the field.
more » ... te the development of scientific workflow management systems, which have simplified workflow planning to some extent, a means to reduce the complexity of user interaction without forfeiting some robustness has been elusive. This violates the essence of scientific progress, where information should be accessible to anyone. A high-level querying interface tantamount to common search engines that can not only return a relevant set of scientific workflows, but also facilitate their execution, may be highly beneficial to users. The development of such a system that can abstract the complex task of scientific workflow planning and execution from the user is reported herein. Our system, Auspice: AUtomatic Service Planning In Cloud/Grid Environments, consists of the following key contributions. Initially, a two-level metadata management framework is introduced. In the top-level, Auspice captures semantic dependencies among available, shared processes and data sets with an ontology. Our system furthermore indexes these shared resources for facilitating fast planning times. This metadata ii framework enables an automatic workflow composition algorithm, which exhaustively enumerates relevant scientific workflow plans given a few key parameters -a marked departure from requiring users to design and manage workflow plans. By applying models on processes, time-critical and accuracy-aware constraints can be realized in this planning algorithm. During the planning phase, Auspice projects these costs and prunes workflow plans in an a priori fashion if they cannot meet the specified constraints. Conversely, when feasible, Auspice can adapt to certain time constraints by trading accuracy for time. To simplify user interaction, both natural language and keyword search interfaces have been developed to invoke the said workflow planning algorithm. Intermediate data caching strategies have also been implemented to accelerate workflow execution over emerging Cloud environments. A focus on cache elasticity is reported, and to this end, we have developed methods to scale and relax resource provisioning for cooperating data caches. Finally, costs of supporting such data caches over various Cloud storage and compute resources have been evaluated. iii To my family, friends, and mentors. iv ACKNOWLEDGMENTS This work is far from complete without proper acknowledgment of those who supported me over the years. My family. When my parents made the decision to move to Ohio from Taiwan over twenty years ago, it was not the ideal situation for either of them. My mother spent tireless hours teaching me English and we struggled through heaps of schoolwork together. My father kept the family afloat through his ongoing work overseas. He also brought me up with great values in life, including humility and patience. Without their sacrifice and guidance, my education would have been impossible. I also thank my sister, Jenn, who has always been my partner in crime in childhood and now. My wife, Michelle, for her infinite patience and unwavering love and understanding during even the most trying times. I am thankful for the many drop-offs and pick-ups to/from Dreese Labs she has had to perform on early mornings and late evenings. I respect her deeply, and she is a great inspiration to me.
doi:10.1007/978-0-85729-049-6_5 fatcat:prikikmcpfdx3ps7554efsh4ti