A Comparative Performance Study for Compute Node Sharing

Jeho Park, Shui F. Lam
<span title="2012-12-30">2012</span> <i title="Korean Institute of Information Scientists and Engineers"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/xvozmuuqprbr3mvk4th3ep6pea" style="color: black;">Journal of Computing Science and Engineering</a> </i> &nbsp;
We introduce a methodology for the study of the application-level performance of time-sharing parallel jobs on a set of compute nodes in high performance clusters and report our findings. We assume that parallel jobs arriving at a cluster need to share a set of nodes with the jobs of other users, in that they must compete for processor time in a time-sharing manner and other limited resources such as memory and I/O in a space-sharing manner. Under the assumption, we developed a methodology to
mulate job arrivals to a set of compute nodes, and gather and process performance data to calculate the percentage slowdown of parallel jobs. Our goal through this study is to identify a better combination of jobs that minimize performance degradations due to resource sharing and contention. Through our experiments, we found a couple of interesting behaviors for overlapped parallel jobs, which may be used to suggest alternative job allocation schemes aiming to reduce slowdowns that will inevitably result due to resource sharing on a high performance computing cluster. We suggest three job allocation strategies based on our empirical results and propose further studies of the results using a supercomputing facility at the San Diego Supercomputing Center. Category: Smart and intelligent computing
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.5626/jcse.2012.6.4.287">doi:10.5626/jcse.2012.6.4.287</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/rrqwhqqmlndg5j3ubuuazv3ffm">fatcat:rrqwhqqmlndg5j3ubuuazv3ffm</a> </span>
