2,413 Hits in 6.1 sec

A stencil scaling approach for accelerating matrix-free finite element implementations [article]

Simon Bauer, Daniel Drzisga, Marcus Mohr, Ulrich Ruede, Christian Waluga, Barbara Wohlmuth
2018 arXiv   pre-print
We present a novel approach to fast on-the-fly low order finite element assembly for scalar elliptic partial differential equations of Darcy type with variable coefficients optimized for matrix-free implementations  ...  Our approach introduces a new operator that is obtained by appropriately scaling the reference stiffness matrix from the constant coefficient case.  ...  The authors gratefully acknowledge the Gauss Centre for Supercomputing (GCS) for providing computing time on the supercomputer SuperMUC at Leibniz-Rechenzentrum (LRZ).  ... 
arXiv:1709.06793v2 fatcat:uvbkzkriinevzhxiqyh45nkbeq

A Scalable and Modular Software Architecture for Finite Elements on Hierarchical Hybrid Grids [article]

Nils Kohl, Dominik Thönnes, Daniel Drzisga, Dominik Bartuschat, and Ulrich Rüde
2018 arXiv   pre-print
In this article, a new generic higher-order finite-element framework for massively parallel simulations is presented.  ...  Combining an unstructured topology with structured grid refinement facilitates high geometric adaptability and matrix-free multigrid implementations with excellent performance.  ...  Acknowledgements This work was partly supported by the German Research Foundation through the Priority Programme 1648 "Software for Exascale Computing" (SPPEXA) and by grant WO671/11-1.  ... 
arXiv:1805.10167v1 fatcat:uxexq24lgjdgxp66mtvkyir5zy

Preconditioning spectral element schemes for definite and indefinite problems

Yair Shapira, Moshe Israeli, Avram Sidi, Uzi Zrahia
1999 Numerical Methods for Partial Differential Equations  
A multigrid preconditioner is also derived from the finite difference preconditioner and is found suitable for the CGS acceleration method.  ...  Spectral element schemes for the solution of elliptic boundary value problems are considered. Preconditioning methods based on finite difference and finite element schemes are implemented.  ...  The above implementation generates another possible approach: replace the finite difference stencil at the finest level of AutoMUG by the original spectral element stencil (scaled such that its central  ... 
doi:10.1002/(sici)1098-2426(199909)15:5<535::aid-num1>;2-r fatcat:lbaaty2bnbfhjbbt7vw27sw3hq

Stencil scaling for vector-valued PDEs on hybrid grids with applications to generalized Newtonian fluids [article]

Daniel Drzisga, Ulrich Rüde, Barbara Wohlmuth
2020 arXiv   pre-print
Matrix-free finite element implementations for large applications provide an attractive alternative to standard sparse matrix data formats due to the significantly reduced memory consumption.  ...  The presented method is based on scaling constant reference stencils originating from a linear finite element discretization instead of evaluating the bilinear forms on-the-fly.  ...  The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V.  ... 
arXiv:1908.08666v2 fatcat:46xqoynaa5cafiohyookgtcpwy

High-Level Programming of Stencil Computations on Multi-GPU Systems Using the SkelCL Library

Michel Steuwer, Michael Haidl, Stefan Breuer, Sergei Gorlatch
2014 Parallel Processing Letters  
The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA  ...  This makes development of stencil applications a complex, time-consuming, and error-prone task.  ...  Using the FDTD method, we implemented a simulation of the effect of random lasing on a nano-meter scale [5] for our evaluation.  ... 
doi:10.1142/s0129626414410059 fatcat:ofwl3g2v6zatrjncnimtsuxvtm

Optimal FFT-accelerated Finite Element Solver for Homogenization [article]

Martin Ladecký, Richard J. Leute, Ali Falsafi, Ivana Pultarová, Lars Pastewka, Till Junge, Jan Zeman
2022 arXiv   pre-print
We propose a matrix-free finite element (FE) homogenization scheme that is considerably more efficient than generic FE implementations.  ...  The efficiency of our scheme follows from a preconditioned well-scaled reformulation allowing for the use of the conjugate gradient or similar iterative solvers.  ...  Matrix-free Implementation As mentioned in the previous sections, the explicit matrix structure is useful for explanation, but the computations can be performed more efficiently in a matrix-free manner  ... 
arXiv:2203.02962v1 fatcat:tiv3pfkzavgw5hjvymu4ofkmwi

Stencil-Aware GPU Optimization of Iterative Solvers

Daniel Lowell, Jeswin Godwin, Justin Holewinski, Deepan Karthik, Chekuri Choudary, Azamat Mametjanov, Boyana Norris, Gerald Sabin, P. Sadayappan, Jason Sarich
2013 SIAM Journal on Scientific Computing  
Numerical solutions of nonlinear partial differential equations frequently rely on iterative Newton-Krylov methods, which linearize a finite-difference stencil-based discretization of a problem, producing  ...  We also describe autotuning of CUDA implementations based on high-level descriptions of the stencil-based matrix and vector operations.  ...  We thank Barry Smith of Argonne and other members of the PETSc team for fruitful discussions and ongoing support.  ... 
doi:10.1137/120883153 fatcat:ds2t67ie4bcehh7qsbbomwdogu

TerraNeo—Mantle Convection Beyond a Trillion Degrees of Freedom [chapter]

Simon Bauer, Hans-Peter Bunge, Daniel Drzisga, Siavash Ghelichkhan, Markus Huber, Nils Kohl, Marcus Mohr, Ulrich Rüde, Dominik Thönnes, Barbara Wohlmuth
2020 Lecture Notes in Computational Science and Engineering  
This contribution reports on the TerraNeo project which delivered novel matrix-free geometric multigrid solvers for the Stokes system that forms the core of mantle convection models.  ...  Simulation of mantle convection on planetary scales is considered a grand-challenge application even in the exascale era.  ...  A Stencil Scaling Approach for Accelerating Matrix-Free Finite Element Implementations In [9] we present a novel approach to fast on-the-fly low order finite element assembly for scalar elliptic partial  ... 
doi:10.1007/978-3-030-47956-5_19 fatcat:4jhueyv5qrhfdkmpfxxv5lsmqa

Modelling Seismic Wave Propagation for Geophysical Imaging [chapter]

Jean Virieux, Vincent Etienne, Victor Cruz-Atienza, Romain Brossier, Emmanuel Chaljub, Olivier Coutant, Stphane Garambois, Diego Mercerat, Vincent Prieux, Stphane Operto, Alessandra Ribodetti, Josu Tago
2012 Seismic Waves - Research and Analysis  
Acknowledgements We are gratefull to René-Édouard Plessix (SHELL) and Henri Calandra (TOTAL) for fruitful discussions. This work was partially performed using HPC resources from GENCI-  ...  This strategy is opposite to the finite element approach where often the mass matrix is lumped into a diagonal matrix for explicit time integration (Marfurt, 1984) .  ...  This is a quite natural approach similar to the one used in continuous finite-element methods where the test function is set to zero on the free surface boundary.  ... 
doi:10.5772/30219 fatcat:4h2ysh6dsfefnga6ystdvyqbxe

A Performance Comparison of Continuous and Discontinuous Galerkin Methods with Fast Multigrid Solvers

Martin Kronbichler, Wolfgang A. Wall
2018 SIAM Journal on Scientific Computing  
For the hybridized discontinuous Galerkin method, a multigrid approach that combines a grid transfer from the trace space to the space of linear finite elements with algebraic multigrid on further levels  ...  A roofline performance model confirms the advantage of the matrix-free implementation.  ...  Acknowledgments The authors would like to thank Katharina Kormann and Niklas Fehn for discussions about the manuscript.  ... 
doi:10.1137/16m110455x fatcat:o7o7ddc7fnhstnzcgrwld7mpoy

The Loop-of-Stencil-Reduce Paradigm

Marco Aldinucci, Marco Danelutto, Maurizio Drocco, Peter Kilpatrick, Guilherme Peretti Pezzi, Massimo Torquati
2015 2015 IEEE Trustcom/BigDataSE/ISPA  
The paper discusses the implementation of Loop-of-stencilreduce within the FastFlow parallel framework, considering a simple iterative data-parallel application as running example (Game of Life) and a  ...  Loop-of-Stencilreduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop.  ...  The universe of the game is a matrix of cells (for simplicity, we consider a finite non-toroidal world), where each cell can be in two states: alive or dead.  ... 
doi:10.1109/trustcom.2015.628 dblp:conf/trustcom/AldinucciDDKPT15 fatcat:evw5tsyrpjaybj367xka77ocpe

Performance of algebraic multigrid methods for non-symmetric matrices arising in particle methods [article]

Benjamin Seibold
2009 arXiv   pre-print
In contrast, for non-symmetric matrices, theoretical convergence results have been provided only recently. A property that is sufficient for convergence is that the matrix be an M-matrix.  ...  For both types of discretization approaches, we investigate the performance of a classical AMG method, as well as an AMLI type method.  ...  Sudarshan Tiwari for fruitful discussions and support on meshfree finite difference methods and on the application of algebraic multigrid codes.  ... 
arXiv:0905.3005v2 fatcat:ujznssutcnctnmibuncvu62kgy

Software Abstractions and Computational Issues in Parallel Structured Adaptive Mesh Methods for Electronic Structure Calculations [chapter]

Scott Kohn, John Weare, M. Elizabeth Ong, Scott Baden
2000 IMA Volumes in Mathematics and its Applications  
We have applied structured adaptive mesh refinement techniques to the solution of the LDA equations for electronic structure calculations.  ...  A s-nqwhot of the PCG-FA C H.mtree computation on eight of the Cmy T3D. Portions of the timeline without a filled k. rqrment numericnl comp.totion.  ...  We would like to thank Eric Bylaska and Steven Fink for numerous valuable discussions on numerical methods in materials science and parallel implementations and performance.  ... 
doi:10.1007/978-1-4612-1252-2_5 fatcat:enwm7jqkk5gktetks7lyz446pm

MachSuite: Benchmarks for accelerator design and customized architectures

Brandon Reagen, Robert Adolf, Yakun Sophia Shao, Gu-Yeon Wei, David Brooks
2014 2014 IEEE International Symposium on Workload Characterization (IISWC)  
MachSuite spans a broad application space, captures a variety of different program behaviors, and provides implementations tailored towards the needs of accelerator designers and researchers, including  ...  To improve standardization within the accelerator research community, we present MachSuite, a collection of 19 benchmarks for evaluating high-level synthesis tools and accelerator-centric architectures  ...  Our implementation is a wavefront computation that populates a square similarity matrix as it executes.  ... 
doi:10.1109/iiswc.2014.6983050 dblp:conf/iiswc/ReagenASWB14 fatcat:oijgahsvavczzchn7mp4jaj23y

Medusa: A C++ Library for solving PDEs using Strong Form Mesh-Free methods [article]

Jure Slak, Gregor Kosec
2019 arXiv   pre-print
Medusa, a novel library for implementation of strong form mesh-free methods, is described.  ...  Medusa implements the core mesh-free elements as independent blocks, which offers users great flexibility in experimenting with the method they are developing, as well as easily comparing it with other  ...  In mesh-free methods the computational domain is represented by a could of points instead of a mesh of elements, as is typical for mesh-based methods.  ... 
arXiv:1912.13282v1 fatcat:yiri54grm5bmzdgqufvc2p25i4
« Previous Showing results 1 — 15 out of 2,413 results