Modeling and Stack Simulation of CMP Cache Capacity and Accessibility
IEEE Transactions on Parallel and Distributed Systems
Performance tradeoffs between fast data access by local data replication and cache capacity maximization by global data sharing have been extensively studied for many-core Chip-Multiprocessors (CMPs). Costly simulations over a wide spectrum of the design space are generally required to gain insight for a sound design. To lower the cost, we develop an abstract model for understanding the performance impact of data replication on CMP caches. To overcome the lack of real-time interactions among
... teractions among multiple cores in the model, we further develop an efficient single-pass stack simulation to study the performance of CMP cache organizations with various degrees of data replication. The global stack logically incorporates a shared stack and per-core private stacks; shared/private reuse (stack) distances can be collected in a single-pass simulation. With the reuse distances, one can calculate the performance of CMP cache organizations with various degrees of data replication. We verify both the model and the stack simulation against execution-driven simulations with commercial multithreaded workloads. The results show that the abstract model provides accurate information about performance tradeoffs of data replication. The stack simulation accurately predicts the performance of various cache organizations with 2-9% error margins using only about 8% of the simulation time.