A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2006; you can also visit <a rel="external noopener" href="http://www.cse.psu.edu:80/~anand/csl/papers/hpca97b.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<i title="IEEE Comput. Soc. Press">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/n7ljjecrpje5pj66pk4pvx65qu" style="color: black;">Proceedings Third International Symposium on High-Performance Computer Architecture</a>
Shared memory machines offer the convenience of a shared address space. This makes them particularly appealing for applications with dynamic communication behavior since the mechanisms for data transfer between processors is hidden from the programmer. But the scalability of these machines is often limited by the latencies incurred in accessing locations in remote memories. Caches alleviate this problem by exploiting the temporal and spatial locality in an application. However, the induced<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/hpca.1997.569660">doi:10.1109/hpca.1997.569660</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/hpca/Sivasubramaniam97.html">dblp:conf/hpca/Sivasubramaniam97</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/g7jhtrw3fbaedhpboupv4df44q">fatcat:g7jhtrw3fbaedhpboupv4df44q</a> </span>
more »... ic for maintaining coherence can have a large impact on limiting performance. Invalidation-based protocols for coherence maintenance are conservative and always resort to receiver-initiated communication. Thus the receiver may have to experience the entire latency of the data transfer even though the data item may have been available much earlier. Update-based schemes, though sender-initiated, can incur high write overheads by sending redundant updates to processors that may not need them. The goal of this research is to reduce the read and write latencies of applications with dynamic communication behavior by employing intelligent sender-initiated data transfer mechanisms. In the process, we would like to keep our demands from the programmer, the compiler, and the hardware as low as possible. Towards this goal, we present a set of write primitives that lower the communication overhead for shared memory accesses governed by locks. We demonstrate the performance benefits of these primitives using a database application drawn from the Geographical Information Systems (GIS) domain. We explore the competitive update mechanism for the remaining shared memory accesses. Using a set of applications, we examine the amount of history that we need to maintain for an effective competitive update scheme. We also show how this effective scheme can be implemented in software on emerging shared memory architectures with little hardware support.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20060529001531/http://www.cse.psu.edu:80/~anand/csl/papers/hpca97b.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/hpca.1997.569660"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>