A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit <a rel="external noopener" href="http://www.supercomp.org/sc2003/paperpdfs/pap136.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System
<span title="">2003</span>
<i title="ACM Press">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/zigbcra6rjdivda6lkzknwuo5q" style="color: black;">Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03</a>
</i>
A parallel application benefits from scheduling policies that include a global perspective of the application's process working set. As the interactions among cooperating processes increase, mechanisms to ameliorate waiting within one or more of the processes become more important. In particular, collective operations such as barriers and reductions are extremely sensitive to even usually harmless events such as context switches among members of the process working set. For the last 18 months,
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1048935.1050161">doi:10.1145/1048935.1050161</a>
<a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/sc/JonesDNTBFBCMTR03.html">dblp:conf/sc/JonesDNTBFBCMTR03</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/utaonbdorveyrjgz4h2rvitcd4">fatcat:utaonbdorveyrjgz4h2rvitcd4</a>
</span>
more »
... e have been researching the impact of random short-lived interruptions such as timer-decrement processing and periodic daemon activity, and developing strategies to minimize their impact on large processor-count SPMD bulk-synchronous programming styles. We present a novel co-scheduling scheme for improving performance of fine-grain collective activities such as barriers and reductions, describe an implementation consisting of operating system kernel modifications and run-time system, and present a set of empirical results comparing the technique with traditional operating system scheduling. Our results indicate a speedup of over 300% on synchronizing collectives.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170808213740/http://www.supercomp.org/sc2003/paperpdfs/pap136.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/c6/30/c630a0d897e3204ecf9d7ae9c73d01a332dc6da4.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1145/1048935.1050161">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="external alternate icon"></i>
acm.org
</button>
</a>