A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2014; you can also visit <a rel="external noopener" href="http://www.cs.rochester.edu/u/scott/papers/1995_CAN_minimal_HW.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="Association for Computing Machinery (ACM)">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/35q3ync5nbhnjfpylznlz57lyi" style="color: black;">SIGARCH Computer Architecture News</a>
Shared memory is widely regarded as a more intuitive model than message passing for the development of parallel programs. A shared memory model can be provided by hardware, software, or some combination of both. One of the most important problems to be solved in shared memory environments is that of cache coherence. Experience indicates, unsurprisingly, that hardware-coherent multiprocessors greatly outperform distributed sharedmemory (DSM) emulations on message-passing hardware. Intermediate<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/218864.218870">doi:10.1145/218864.218870</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ukmwch45y5gkvlmqfnkunzd6zi">fatcat:ukmwch45y5gkvlmqfnkunzd6zi</a> </span>
more »... tions, however, have received considerably less attention. We argue in this position paper that one such option-a multiprocessor or network that provides a global physical address space in which processors can make non-coherent accesses to remote memory without trapping into the kernel or interrupting remote processors-can provide most of the performance of hardware cache coherence at little more monetary or design cost than traditional DSM systems. To support this claim we have developed the Cashmere family of software coherence protocols for NCC-NUMA (Non-Cache-Coherent, Non-Uniform-Memory Access) systems, and have used execution-driven simulation to compare the performance of these protocols to that of full hardware coherence and distributed shared memory emulation. We have found that for a large class of applications the performance of NCC-NUMA multiprocessors rivals that of fully hardware-coherent designs, and significantly surpasses the performance realized on more traditional DSM systems.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20141025115224/http://www.cs.rochester.edu/u/scott/papers/1995_CAN_minimal_HW.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/78/20/78200c86788d5887c55aaff3195ab010b40959f6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/218864.218870"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> acm.org </button> </a>