The National Scalable Cluster Project: Three Lessons about High Performance Data Mining and Data Intensive Computing
The National Scalable Cluster Project (NSCP) collaboration of research groups has pioneered the application of cluster computing and high performance wide area networks to a variety of problems in data mining and data intensive computing, including working with several terabyte size collections. The core research groups from the University of Illinois at Chicago and the University of Pennsylvania work with collaborators at over ten institutions. The project was founded in 1994 by Grossman and
... llebeek. The NSCP collaborators have assembled local clusters of workstations and connected these local clusters into wide area clusters of clusters or Meta-Clusters. NSCP also developed several software packages for data intensive computing using the Meta-Cluster. The NSCP-1 Meta-Cluster was completed in 1996 and linked geographically distributed clusters using the commodity internet. The NSCP-2 Meta-Cluster was completed in 1998 and used OC-3 networks to link the clusters. The NSCP-1 and NSCP-2 Meta-Clusters have been used by a variety of scientists and engineers working on applications in high energy physics, computational chemistry, nonlinear simulation, bioinformatics, medical imaging, network traffic analysis, digital libraries of video data, and economic data. An NSCP-3 Meta-Cluster is currently being designed and tentatively scheduled for deployment in 2001. The NSCP-3 Meta-Cluster is being designed to exploit wave division multiplexing (WDM) technology. WDM is now being used to greatly increase the available bandwidth on links connecting geographically distributed nodes by packing many wavelengths carrying separated data streams onto a single fiber. Currently, the NSCP consists of approximately 100 nodes and 3 terabytes of disk geographically distributed among the participating sites, and connected by laboratory, campus, and national ATM networks. NSCP developed software and third party software are provided so that applications can transparently access as many nodes and as much disk as required. NSCP supports one large digital library (500 Gigabytes), two moderate size digital libraries (100 Gigabytes each), and several smaller ones. More details can be found at http:/nscp.upenn.edu and http://www.nscp.uic.edu.