Enabling Department-Scale Supercomputing [chapter]

David S. Greenberg, William E. Hart, Cynthia A. Phillips
1999 IMA Volumes in Mathematics and its Applications  
The Department of Energy (DOE) national laboratories have one of the longest and most consistent histories of supercomputer use. We summarize the architecture of DOE'S new supercomputers that are being built for the Accelerated Strategic Computing Initiative (ASCI). We then argue that in the near future scaled-down versions of these supercomputers with petaflop-per-weekend capabilities could become widely available to hundreds of research and engineering departments. The availability of such
more » ... putational resources will allow simulation of physical phenomena to become a full-fledged third branch of scientific exploration, along with theory and experimentation. We describe the ASCI and other supercomputer applications at Sandia National Laboratories, and discuss which lessons learned from Sandia's long history of supercomputing can be applied in this new setting. of Pi% and development of the algorithms, applications, hardware, systems software and tools needed to implement science-based stockpile stewardship. In Section 2, we summarize the architectures of the new massively parallel machines under development for the ASCI program. We believe the lessons learned from the supercomputing efforts within DOE have applicability far beyond the scope of ASCI supercomputing. In Section 3 we argue that systems with supercomputer-level performance (though not yet ASCI-level performance) could soon become available to the department-scale research group. That is, by tightly networking commodity components, academic and industrial research departments should be able to afford petaflop/weekend performance by the year 2000. These machines could be built incrementally with minimum funding impact by using "Stone Soup" tactics. In analogy with the children's story, each researcher who wants to use the department's supercomputer will contribute something to the pot: a few more processors, some interconnect, etc.l We review some major supercomputer applications at Sandia National Laboratories (Section 4) to illustrate the capabilities of these machines. We argue that the general methods computational scientists use at Sandia to achieve maximum performance on these machines are generally applicable, particularly for distributed memory machines like the ASCI Red machine. Section 5 describes some of the lessons learned at Sandia as we have advanced from prototype high-performance computers such as the nCUBE and the Paragon to ASCI-class machines. Some of these lessons can be applied to these new mini-supercomputers, but in some cases there is still much to be learned. In particular, we consider issues of the usage model, programming model, resource management, data movement, system reliability, and code evolution. Section 6 offers some concluding remarks. ASCI Supercomputers In the 90's researchers at various DOE laboratories used many high-performance machines including Paragons, @ray T3Ds, SP2s, nCUBEs, and CM5s. From 1994 to 1997, the primary machine for large simulations at Sandia National Laboratories has been an Intel Paragon. This machine has over 1800 nodes, each consisting of two i86OXP processors, which operate at 75 megaflops (MF) each. When one processor on each node is used as a communication coprocessor, as per the original design, this yields a peak performance of 140 gigaflops (GF). Sandia enabled the second processor to be used for computation, though hampered by low memory bandwidth. This allowed some applications to exceed the advertised peak performance of the machine. The nodes are arranged in a 16 x 120 mesh, with 5 1/0 columns in the middle. Communication links can move 200 megabytes per second (MB/sec) in each direction. The machine has 37 gigabytes (GB) of RAM and 330GB of disk space. At Sandia, we have recently installed the ASCI Red machine. Sandia and Intel have expanded the ideas proven successful in the Paragon to create a commodity-based supercomputer. Though the CM-5 used SPARC processors for nonfloating point computation [21] , and the Cray T3D uses alpha chips [5], this is the first machine where true supercomputing performance for scientific computing is delivered by processors that will also be used in millions of PCs. Where the Paragon used the end-of-the-line i860 processor, an embedded processor, the ASCI Red machine uses the mainstream Pentium-proTM processor. Over 9000 Pentium-prosTM, each of which provides 200MF peak, are tightly integrated to produce a total peak performance of 1.8 teraflops (TF). The machine sustained 1.3TF on the MPLinpack benchmark in June, 1997. 'Recently several researchers at Oak Ridge National Laboratory began an attempt to apply this model literally by collecting equipment scheduled for reapplication [24]. 2
doi:10.1007/978-1-4612-1516-5_15 fatcat:5p4pp5bxdrevxmh2c2m5fei7ve