Methods and Experiences for Developing Abstractions for Data-intensive, Scientific Applications
Developing software for scientific applications that require the integration of diverse types of computing, instruments, and data present challenges that are distinct from commercial software. These applications require scale, and the need to integrate various programming and computational models with evolving and heterogeneous infrastructure. Pervasive and effective abstractions for distributed infrastructures are thus critical; however, the process of developing abstractions for scientific
... lications and infrastructures is not well understood. While theory-based approaches for system development are suited for well-defined, closed environments, they have severe limitations for designing abstractions for scientific systems and applications. The design science research (DSR) method provides the basis for designing practical systems that can handle real-world complexities at all levels. In contrast to theory-centric approaches, DSR emphasizes both practical relevance and knowledge creation by building and rigorously evaluating all artifacts. We show how DSR provides a well-defined framework for developing abstractions and middleware systems for distributed systems. Specifically, we address the critical problem of distributed resource management on heterogeneous infrastructure over a dynamic range of scales, a challenge that currently limits many scientific applications. We use the pilot-abstraction, a widely used resource management abstraction for high-performance, high throughput, big data, and streaming applications, as a case study for evaluating the DSR activities. For this purpose, we analyze the research process and artifacts produced during the design and evaluation of the pilot-abstraction. We find DSR provides a concise framework for iteratively designing and evaluating systems. Finally, we capture our experiences and formulate different lessons learned.