Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

An-Chow Lai, Babak Falsafi
2000 Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures - SPAA '00  
In this paper, we compare and contrast two techniques to improve capacity/conflict miss traffic in CC-NUMA DSM clusters. Page migration/replication optimizes read-write accesses to a page used by a single processor by migrating the page to that processor and replicates all read-shared pages in the sharers' local memories. R-NUMA optimizes read-write accesses to any page by allowing a processor to cache that page in its main memory. Page migration/ replication requires less hardware complexity
more » ... compared to R-NUMA, but has limited applicability and incurs much higher overheads even with tuned hardware/software support. In this paper, we compare and contrast page migration/replication and R-NUMA on simulated clusters of symmetric multiprocessors executing shared-memory applications. Our results show that: (1) both page migration/replication and R-NUMA significantly improve the system performance over "first-touch" migration in many applications, (2) page migration/replication has limited opportunity and can not eliminate all the capacity/conflict misses even with fast hardware support and unlimited amount of memory, (3) R-NUMA always performs best given a page cache large enough to fit an application's primary working set and subsumes page migration/replication, (4) R-NUMA benefits more from hardware support to accelerate page operations than page migration/ replication, and (5) integrating page migration/replication into R-NUMA to help reduce the hardware cost requires sophisticated mechanisms and policies to select candidates for page migration/ replication.
doi:10.1145/341800.341811 dblp:conf/spaa/LaiF00 fatcat:xpdbdm5wmfcctazvqt2h476h5u