NoCMsg: Scalable NoC-Based Message Passing
2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Current processor design with ever more cores may ensure that theoretical compute performance still follows past increases (resting from Moore's law), but they also increasingly present a challenge to hardware and software alike. As the core count increases, the network-on-chip (NoC) topology has changed from buses over rings and fully connected meshes to 2D meshes. The question is which programming paradigm provides the scalability needed to ensure performance is close to theoretical peak,
... eoretical peak, where 2D meshes provide the most scalable design to date. This work contributes NoCMsg, a low-level message passing abstraction over NoCs. NoCMsg is specifically designed for large core counts in 2D meshes. Its design ensures deadlock free messaging for wormhole Manhattan-path routing over the NoC. Experimental results on the TilePro hardware platform show that NoCMsg can significantly reduce communication times by up to 86% for single packet messages and up to 40% for larger messages compared to other NoC-based message approaches. Results further demonstrate the potential of NoC messaging to outperform shared memory abstractions by up to 93% as core counts and inter-process communication increase, i.e., we observe that shared memory scales up to about 16 cores while message passing performs well beyond that threshold on this platform. To the best of our knowledge, this is the first head-on comparison of shared memory and advanced message passing specifically designed for NoCs on an actual hardware platform with larger core counts on a single socket.