An Unsupervised Learning Approach for I/O Behavior Characterization

Pablo J. Pavan, Jean Luca Bez, Matheus S. Serpa, Francieli Zanon Boito, Philippe O. A. Navaux
2019 2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)  
I/O operations are the bottleneck of several applications due to the difference between processing and data access speeds. Hence, understanding the I/O behavior is vital to find problems and propose solutions. Thus, identifying and characterizing the I/O access pattern is important, since it reflects directly on applications' performance. With this premise, we propose an I/O characterization approach that uses unsupervised learning to cluster jobs with similar I/O behavior, using information
more » ... m high-level aggregated traces. As a case study, we apply our approach on four months of activity -a total of 28, 938 jobsfrom the Intrepid supercomputer located at Argonne Laboratory. Our experimental results show that nine access patterns represent the I/O behavior in 73% of the clusters. From these nine patterns, we learn some aspects about the I/O such as the most accesses patterns are made using POSIX and small requests, also, the most patterns are accessing unique files. Lastly, analyzing the I/O workload over four months, we can notice that it is composed by several applications that spend a short time on I/O activity, but when compared to the others, the total I/O time represents a greater portion of the overall system.
doi:10.1109/sbac-pad.2019.00019 dblp:conf/sbac-pad/PavanBSBN19 fatcat:ig35uy4wjvai3c2nlhvaptp7ou