Operating system support for improving data locality on CC-NUMA compute servers

Ben Verghese, Scott Devine, Anoop Gupta, Mendel Rosenblum
1996 SIGPLAN notices  
The dominant architecture for the next generation of sharedmemory multiprocessors is CC-NUMA (cache-coherent non-w@orm memory architecture). These machines are attractive as compute servers because they provide transparent access to local and remote memory. However the access latency to remote memory is 3 to 5 times the latency to local memory. CC-NOW machines provide the benejits of cache coherence to networks of workstations, at the cost of even higher remote access latency. Given the large
more » ... mote access latencies of these architectures, &ta locality is potentially the most important perjorrnance issue. Using realistic workloads, we study the pe~ormance improvements provided by OS supported dynamic page migration and replication. Analyzing our kernel-based implementation, we provide a detailed breakdown of thecosts. We show that sampling of cache misses can be used to reduce cost without compromising perjorrnance, and that TB misses may not be a consistent approximation for cache misses. Finally, our experiments show that dynamic page migration and replication can substantially increase application performance, as much as 30~o, and reduce contention for resources in the NUMA memory system.
doi:10.1145/248209.237205 fatcat:ll62vd7k3zdcjijav5ysr2jy6q