High-speed distributed data handling for high-energy and nuclear physics [article]

A Shoshani, W E Johnston, W H Greiman, C E Tull, B L Tierney
1997
The advent (and promise) of shared, widely available, high-speed networks provides the potential for new approaches to the collection, organization, storage, and analysis of high-speed and high-volume data streams from on-line instruments. Such data streams originate from many types of on-line instruments and imaging systems, and are a "staple" of modern scientific, health care, and intelligence environments. We are defining and implementing an approach that provides for real-time analysis,
more » ... loguing, and archiving of the data streams through the integration of data management techniques, a high-speed distributed application-oriented cache, distributed high performance applications, and transparent management of tertiary storage systems. In the "Data Access and Analysis of Massive Datasets for High-Energy and Nuclear Physics" (a Grand Challenge project of DOE, Energy Research, Mathematical, Information, and Computational Sciences Division) we are addressing the issues of organizing and querying massive data sets. In our data-intensive computing projects we are addressing issues associated with capture, processing, cataloguing/indexing, and tertiary storage management. In our high-speed, widely distributed computing project we are addressing the technology and architectures needed to support widely dispersed resources, users, and data sources all having location transparent access to the data and resources through the use of parallel-distributed computing and highspeed wide area networks. This type of problem -dealing with high volume, high rate data streams from instruments, the associated data management problems for the resulting massive data sets, and widely distributed user communities -is a key issue for modern, large-scale science. THE HENP GRAND CHALLANGE 3 Advances in computational capabilities, information management, and multi-user data access are essential if the next generation of experiments in both high energy and nuclear physics are to be able to fully address the forefront scientific issues for which they are designed. Among these forefront issues are two most fundamental questions facing high energy and nuclear physics today, namely characterization of the transition to the Quark-Gluon Plasma (QGP) phase of matter and the discovery of the mechanism responsible for electro-weak symmetry breaking. These experiments will record and analyze data from physics events of unprecedented complexity. The resulting data streams of up to tens of megabytes per second and the requirements for "data mining" in huge (tens of terabytes) data sets by multiple, geographically distributed teams of scientists set the scale of this Grand Challenge proposal. Simple extrapolations of existing techniques will not be sufficient; new
doi:10.5170/cern-1997-008.85 fatcat:4otdbenvd5gtxjokxdrnxhls2m