A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2016; you can also visit <a rel="external noopener" href="http://pcl.intel-research.net/publications/IPDPS_2015_PETSc-FUN3D.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
Exploring Shared-Memory Optimizations for an Unstructured Mesh CFD Application on Modern Parallel Systems
<span title="">2015</span>
<i title="IEEE">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/t3x4vqewrncrfgn2wu7cafsbsq" style="color: black;">2015 IEEE International Parallel and Distributed Processing Symposium</a>
</i>
In this work, we revisit the 1999 Gordon Bell Prize winning PETSc-FUN3D aerodynamics code, extending it with highly-tuned shared-memory parallelization and detailed performance analysis on modern highly parallel architectures. An unstructured-grid implicit flow solver, which forms the backbone of computational aerodynamics, poses particular challenges due to its large irregular working sets, unstructured memory accesses, and variable/limited amount of parallelism. This code, based on a domain
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/ipdps.2015.114">doi:10.1109/ipdps.2015.114</a>
<a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/ipps/MudigereSDPHSKD15.html">dblp:conf/ipps/MudigereSDPHSKD15</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/fajgerf4zfgtnb2l2wafdnd27y">fatcat:fajgerf4zfgtnb2l2wafdnd27y</a>
</span>
more »
... composition approach, exposes tradeoffs between the number of threads assigned to each MPI-rank subdomain, and the total number of domains. By applying several algorithmand architecture-aware optimization techniques for unstructured grids, we show a 6.9X speed-up in performance on a singlenode Intel R Xeon TM 1 E5 2690v2 processor relative to the outof-the-box compilation. Our scaling studies on TACC Stampede supercomputer show that our optimizations continue to provide performance benefits over baseline implementation as we scale up to 256 nodes.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20160629233213/http://pcl.intel-research.net/publications/IPDPS_2015_PETSc-FUN3D.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/f0/b8/f0b85ef85c415fd9d1b8c0fbcdab1e6d307dd7bf.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/ipdps.2015.114">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="external alternate icon"></i>
ieee.com
</button>
</a>