Scalable Parallel Octree Meshing for TeraScale Applications
Tiankai Tu, D.R. O'Hallaron, O. Ghattas
ACM/IEEE SC 2005 Conference (SC'05)
We present a new methodology for generating and adapting octree meshes for terascale applications. Our approach combines existing methods, such as parallel octree decomposition and space-filling curves, with a set of new methods that address the special needs of parallel octree meshing. We have implemented these techniques in a parallel meshing tool called Octor. Performance evaluations on up to 2000 processors show that Octor has good isogranular scalability, fixed-size scalability, and
more »
... e running time. Octor also provides a novel data access interface to parallel PDE solvers and parallel visualization pipelines, making it possible to develop tightly coupled end-to-end finite element simulations on terascale systems. Introduction. The emergence of terascale computing has created unprecedented new opportunities for scientists and engineers to simulate complex physical phenomena on a larger scale and at a higher resolution than heretofore possible. This trend has introduced new challenges to applications, algorithms, and systems software. In particular, a significant challenge for terascale finite element simulations is how to generate and adapt high-resolution unstructured meshes with billions of nodes and elements, and how to deliver such meshes to the processors of terascale systems. A typical approach for preparing a finite element mesh for simulation is to first generate a large mesh structure offline on a server [34, 35] , and then upload the mesh to a supercomputer, where additional steps such as mesh partitioning [22] and data redistribution are performed. The result is a series of time-consuming file transfers and disk I/O that consume large amounts of network and storage resources while contributing nothing to the applications that will be using the mesh structure. Further, since the meshes are generated offline on a server, the offline algorithm is unable to adapt the mesh dynamically at runtime. We propose a better approach where the meshes are generated in situ, on the same processors where they will later be used by applications such as finite element solvers and visualization pipelines. In this paper, we describe a new methodology for parallel octree meshing for terascale applications. At first glance, it may appear that parallel octree meshing is simply a direct application of the well-known parallel octree method. In fact, parallel octree meshing is fundamentally different from other parallel octree applications in that it must manipulate the vertices (corners) of the octants, which correspond to the mesh nodes. This is required to support finite element simulations, which associate unknowns to mesh nodes and then solve the resulting linear system. Complicated correlations between octants and vertices, and between vertices and vertices, either on the same processor or on different processors, must all be identified and tracked. Thus, parallel octree meshing presents a set of new problems that do not exist in other parallel octree applications. Our work builds on a foundation of previous work on parallel octrees [4, 10, 14, 32, 37, 39], mesh generation [6, 7, 8, 13, 14, 21, 26, 31] , parallel adaptive mesh refinement [2, 9, 23, 38] , parallel adaptive finite element methods [18, 28] and space-filling curves [5, 15] . To address the special requirements of parallel octree meshing, we have developed a set of new algorithms and techniques: (1) a new algorithm called parallel prioritized ripple propagation that balances an octree efficiently; (2) a sophisticated memory management scheme that facilitates data migration between processors; (3) a new method of extracting parallel mesh structures that delivers mesh data directly to solvers; and (4) a novel data access interface that allows solvers or visualizers to easily interact with the mesher. We have implemented our methodology within a new parallel meshing tool called Octor. Performance evaluations show that Octor has good isogranular scalability, good fixed-size scalability, and good absolute *
doi:10.1109/sc.2005.61
dblp:conf/sc/TuOG05
fatcat:ujcaryckr5fbvb4cnzidff4fnq