Smart caching at CMS: applying AI to XCache edge services

CMS Collaboration
2019 Zenodo  
The envisaged Storage and Compute needs for the HL-LHC will be a factor up to 10 above what can be achieved by the evolution of current technology within a flat budget. The WLCG community is studying possible technical solutions to evolve the current computing in order to cope with the requirements; one of the main focuses is resource optimization, with the ultimate objective of improving performance and efficiency as well as simplifying and reducing operation costs. As of today the storage
more » ... olidation based on a Data Lake model is considered a good candidate for addressing HL-LHC data access challenges, allowing global redundancy instead of local redundancy, dynamic adaptation of QoS, intelligent data deployment based on cost driven metrics. A Data Lake model under evaluation can be seen as a logical entity which hosts a distributed working set of analysis data. Compute power can be close to the lake, but also remote and thus completely external. In this context we expect Data caching to play a central role as a technical solution to reduce the impact of latency and reduce network load. A geographically distributed caching layer will be functional to many satellite computing centers might appear and disappear dynamically. In this talk we propose to develop a flexible and automated AI environment for smart management of the content of clustered cache systems, to optimize hardware for the service and operations for maintenance. In this talk we demonstrate a AI-based smart caching system, and discuss the implementation of training and inference facilities along with the XCache integration with the smart decision service. Finally, we evaluate the effect on smart-caches and data placement, and compare data placement algorithm with and without ML model.
doi:10.5281/zenodo.3598799 fatcat:g3j6k63dwrad5jnbmnob3xj5bi