A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2006; you can also visit the original URL.
The file type is application/pdf
.
Filtering Failure Logs for a BlueGene/L Prototype
2005 International Conference on Dependable Systems and Networks (DSN'05)
The growing computational and storage needs of several scientific applications mandate the deployment of extremescale parallel machines, such as IBM's BlueGene/L which can accommodate as many as 128K processors. In this paper, we present our experiences in collecting and filtering error event logs from a 8192 processor BlueGene/L prototype at IBM Rochester, which is currently ranked #8 in the Top-500 list. We analyze the logs collected from this machine over a period of 84 days starting from
doi:10.1109/dsn.2005.50
dblp:conf/dsn/LiangZSSMG05
fatcat:hribyyz6pnh57dhgpoqiewpmcu