Poster reception---Harnessing grid resources to enable the dynamic analysis of large astronomy datasets

Ioan Raicu, Ian Foster, Alex Szalay
2006 Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06  
Grid computing has emerged as an important new field focusing on large-scale resource sharing and high-performance orientation. The astronomy community has an abundance of imaging datasets at its disposal which are essentially the "crown jewels" for the astronomy community. However, these astronomy datasets are generally terabytes in size and contain hundreds of millions of objects separated into millions of files-factors that make many analyses impractical to perform on small computers. The
more » ... question we answer in this paper is: "How can we leverage Grid resources to make the analysis of large astronomy datasets a reality for the astronomy community?" Our answer is "AstroPortal," a gateway to grid resources tailored for the astronomy community. To address this question, we have developed a Web Services-based system, AstroPortal, that uses grid computing to federate large computing and storage resources for dynamic analysis of large datasets. Building on the Globus Toolkit 4, we have built an AstroPortal prototype and implemented a first analysis, "stacking," that sums multiple regions of the sky, a function that can help both identify variable sources and detect faint objects. We have deployed AstroPortal on the TeraGrid distributed infrastructure and applied the stacking function to the Sloan Digital Sky Survey (SDSS), DR4, which comprises about 300 million objects dispersed over 1.3 million files, a total of 3 terabytes of compressed data, with promising results. AstroPortal gives the astronomy community a new tool to advance their research and to open new doors to opportunities never before possible on such a large scale. Furthermore, we have identified that data locality in distributed computing applications is important for the efficient use of the underlying resources. We outline a storage hierarchy that could be used to make more efficient use of the available resources, which could potentially offer orders of magnitude speed ups in the analysis of large datasets.
doi:10.1145/1188455.1188611 dblp:conf/sc/RaicuFS06 fatcat:l5mjdomo4rd45em2dz4ib52pyq