Improved ATLAS HammerCloud Monitoring for Local Site Administration

M Böhler, J Elmsheuser, F Hönig, F Legger, V Mancinelli, G Sciacca
2015 Journal of Physics, Conference Series  
Every day hundreds of tests are run on the Worldwide LHC Computing Grid for the ATLAS, and CMS experiments in order to evaluate the performance and reliability of the different computing sites. All this activity is steered, controlled, and monitored by the HammerCloud testing infrastructure. Sites with failing functionality tests are auto-excluded from the ATLAS computing grid, therefore it is essential to provide a detailed and well organized web interface for the local site administrators
more » ... administrators such that they can easily spot and promptly solve site issues. Additional functionality has been developed to extract and visualize the most relevant information. The site administrators can now be pointed easily to major site issues which lead to site blacklisting as well as possible minor issues that are usually not conspicuous enough to warrant the blacklisting of a specific site, but can still cause undesired effects such as a non-negligible job failure rate. This paper summarizes the different developments and optimizations of the HammerCloud web interface and gives an overview of typical use cases.
doi:10.1088/1742-6596/664/6/062004 fatcat:6zgrfg2hrvbbthud6b2vsibiqy