Data-driven spatial locality

Svetozar Miucin, Alexandra Fedorova
2018 Proceedings of the International Symposium on Memory Systems - MEMSYS '18  
Over the past decades, core speeds have been improving at a much higher rate than memory bandwidth. This has caused the performance bottlenecks in modern software to shift from computation to data transfers. Hardware caches were designed to mitigate this problem, based on the principles of temporal and spatial locality. However, with the increasingly irregular access patterns in software, locality is difficult to preserve. Researchers and practitioners devote a lot of time and effort to
more » ... d effort to improving memory performance from the software side. This is done either by restructuring the code to make access patterns more regular, or by changing the layout of data in memory to better accommodate caching policies. Experts often use correlations between the access pattern of an algorithm and properties of the objects it operates on to devise new ways to lay data out in memory. Prior work has shown the memory layout design process to be largely manual and difficult enough to result in high level publications. Our contribution is a set of tools, techniques and algorithms for automatic extraction of correlations between data and access patterns of programs. In order to collect a sufficient level of details about memory accesses, we present a compiler-based access instrumentation framework called DI-NAMITE. Further, we introduce access graphs, a novel way of representing spatial locality properties of programs which are generated from memory access traces. We use access graphs as a basis for Hierarchical Memory Layouts -a novel algorithm for estimating performance improvements to be gained from better data layouts. Finally, we present our Data-Driven Spatial Locality techniques which use the information available from previous steps to automatically extract the correlations between data and access patterns commonly used by experts to inform better layout design. iii Lay Summary Over the past decades, the disparity between processor and main memory speeds has grown significantly. Many important applications today suffer from poor memory performance. To improve these applications, experts manually tune the placement of data in memory. This process requires a deep understanding of the algorithm and the underlying hardware on which it runs. This work presents insights and analysis into how experts create performant memory layouts, and proposes new abstractions, algorithms and techniques to automate parts of the process. The proposed solutions have the potential of helping performance-minded engineers to improve data layout in their programs. iv
doi:10.1145/3240302.3240417 dblp:conf/memsys/MiucinF18 fatcat:ay76jpnd3jcfzjpqyw42c4vemm