Improving the scalabiliy of neutron cross-section lookup codes on multicore NUMA system [article]

Kazutomo Yoshii, John Tramm, Andrew Siegel, Pete Beckman
2019 arXiv   pre-print
We use the XSBench proxy application, a memory-intensive OpenMP program, to explore the source of on-node scalability degradation of a popular Monte Carlo (MC) reactor physics benchmark on non-uniform memory access (NUMA) systems. As background, we present the details of XSBench, a performance abstraction "proxy app" for the full MC simulation, as well as the internal design of the Linux kernel. We explain how the physical memory allocation inside the kernel affects the multicore scalability of
more » ... XSBench. On a sixteen-core, two-socket NUMA testbed, the scaling efficiency is improved from a nonoptimized 70% to an optimized 95%, and the optimized version consumes 25% less energy than does the nonoptimized version. In addition to the NUMA optimization we evaluate a page-size optimization to XSBench and observe a 1.5x performance improvement, compared with a nonoptimized one.
arXiv:1909.03632v1 fatcat:tlg5i6pxg5f3lfgvwwhearbwqu