To upgrade or not to upgrade? Catamount vs. Cray Linux Environment

S.D. Hammond, G.R. Mudalige, J.A. Smith, J.A. Davis, S.A. Jarvis, J. Holt, I. Miller, J.A. Herdman, A. Vadgama
2010 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)  
Modern supercomputers are growing in diversity and complexity -the arrival of technologies such as multicore processors, general purpose-GPUs and specialised compute accelerators has increased the potential scientific delivery possible from such machines. This is not however without some cost, including significant increases in the sophistication and complexity of supporting operating systems and software libraries. This paper documents the development and application of methods to assess the
more » ... tential performance of selecting one hardware, operating system (OS) and software stack combination against another. This is of particular interest to supercomputing centres, which routinely examine prospective software/architecture combinations and possible machine upgrades. A case study is presented that assesses the potential performance of a particle transport code on AWE's Cray XT3 8,000core supercomputer running images of the Catamount and the Cray Linux Environment (CLE) operating systems. This work demonstrates that by running a number of small benchmarks on a test machine and network, and observing factors such as operating system noise, it is possible to speculate as to the performance impact of upgrading from one operating system to another on the system as a whole. This use of performance modelling represents an inexpensive method of examining the likely behaviour of a large supercomputer before and after an operating system upgrade; this method is also attractive if it is desirable to minimise system downtime while exploring software-system upgrades. The results show that benchmark tests run on less than 256 cores would suggest that the impact (overhead) of upgrading the operating system to CLE was less than 10%; model projections suggest that this is not the case at scale.
doi:10.1109/ipdpsw.2010.5470885 dblp:conf/ipps/HammondMSDJHMHV10 fatcat:vfa5kkiiozep5dm3q6qtudyg2m