37 Hits in 9.0 sec

Extreme-scale Multigrid Components within PETSc [article]

Dave A. May, Patrick Sanan, Karl Rupp, Matthew G. Knepley, Barry F. Smith
2016 arXiv   pre-print
The scalability and time-to-solution of massively parallel multilevel preconditioners can be adversely effected by using a coarse-level solver with sub-optimal algorithmic complexity.  ...  To maintain scalability, agglomeration techniques applied to the coarse level have been shown to be necessary.  ...  Government retains for itself, and others acting on its behalf, a paid-up nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public  ... 
arXiv:1604.07163v1 fatcat:gpnszozdfzhepjv7xrkqmdvj7y

A Parallel Incompressible Flow Solver Package with a Parallel Multigrid Elliptic Kernel

John Z. Lou, Robert Ferraro
1996 Journal of Computational Physics  
A grid-partition strategy is used in the parallel implementations of both the flow solver and the multigrid elliptic kernel on all fine and coarse grids.  ...  Both the multigrid elliptic kernel and the flow solver scale very well to a large number of processors on the Intel Paragon and the Cray T3D for computations with moderate granularity.  ...  Steve McCormick (University of Colorado) for some helpful discussions on multigrid methods.  ... 
doi:10.1006/jcph.1996.0090 fatcat:wugxjvupcngwbkdokhnnx2rd7y

A parallel incompressible flow solver package with a parallel multigrid elliptic kernel

John Z. Lou, Robert D. Ferraro
1995 Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '95  
A grid-partition strategy is used in the parallel implementations of both the flow solver and the multigrid elliptic kernel on all fine and coarse grids.  ...  Both the multigrid elliptic kernel and the flow solver scale very well to a large number of processors on the Intel Paragon and the Cray T3D for computations with moderate granularity.  ...  Steve McCormick (University of Colorado) for some helpful discussions on multigrid methods.  ... 
doi:10.1145/224170.224406 dblp:conf/sc/LouF95 fatcat:ahargficdrbirmgkv4tdd46wha

Parallel geometric-algebraic multigrid on unstructured forests of octrees

Hari Sundar, George Biros, Carsten Burstedde, Johann Rudi, Omar Ghattas, Georg Stadler
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
We use geometric multigrid (GMG) for each of the octrees and algebraic multigrid (AMG) as the coarse grid solver.  ...  This forest-of-octrees approach enables us to generate meshes for complex geometries with arbitrary levels of local refinement.  ...  ACKNOWLEDGMENT The authors would like to thank Tobin Isaac for useful discussions and for providing the mesh for the Antarctic ice sheet. Support for this work was provided by: the U.S.  ... 
doi:10.1109/sc.2012.91 dblp:conf/sc/SundarBBRGS12 fatcat:o67eaqhs2vcbblgfr2ulonkp7m

Enabling scalable parallel implementations of structured adaptive mesh refinement applications

Sumir Chandra, Xiaolin Li, Taher Saif, Manish Parashar
2007 Journal of Supercomputing  
This paper presents a runtime engine that addresses the scalability of SAMR applications with localized refinements and high SAMR efficiencies on large numbers of processors (upto 1024 processors).  ...  The SAMR runtime engine augments hierarchical partitioning with bin-packing based load-balancing to manage the space-time heterogeneity of the SAMR grid hierarchy, and includes a communication substrate  ...  grant number PC295251 and 1052856 awarded to Manish Parashar.  ... 
doi:10.1007/s11227-007-0110-z fatcat:nmkub2egsrbvhgd42r6fvk6yuy

Anatomically accurate high resolution modeling of human whole heart electromechanics: A strongly scalable algebraic multigrid solver method for nonlinear deformation

Christoph M. Augustin, Aurel Neic, Manfred Liebmann, Anton J. Prassl, Steven A. Niederer, Gundolf Haase, Gernot Plank
2016 Journal of Computational Physics  
. / Journal of Computational Physics 305 (2016) 622-646 623 heart beat in 44.3, 87.8 and 235.3 minutes, respectively.  ...  The efficiency of the method allows fast simulation cycles without compromising anatomical or biophysical detail.  ...  We acknowledge PRACE for awarding us access to resource SuperMUC based in Germany at LRZ (grant CAMEL), and, partially, ARCHER based in the UK at EPCC (project e384).  ... 
doi:10.1016/ pmid:26819483 pmcid:PMC4724941 fatcat:3kqewacfu5e2lbita7eeq64yeq

A hybrid interface preconditioner for monolithic fluid–structure interaction solvers

Matthias Mayr, Maximilian H. Noll, Michael W. Gee
2020 Advanced Modeling and Simulation in Engineering Sciences  
Powerful preconditioning techniques are crucial when it comes to solving large monolithic systems of linear equations efficiently, especially when arising from coupled multi-physics problems like in fluid-structure  ...  By performing cheap but accurate subdomain solves that do not depend on the separation of physical fields, this error accumulation can be reduced effectively.  ...  Acknowledgements This work was mostly performed while the author was at the Mechanics & High Performance Computing Group, Technical University of Munich, Parkring 35, 85748 Garching b. München.  ... 
doi:10.1186/s40323-020-00150-9 fatcat:sglbkenvcrfbrbx4h5x5qatkfi

Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and its Application to Unstructured Matrices

Jongsoo Park, Mikhail Smelyanskiy, Karthikeyan Vaidyanathan, Alexander Heinecke, Dhiraj D. Kalamkar, Xing Liu, Md. Mosotofa Ali Patwary, Yutong Lu, Pradeep Dubey
2014 SC14: International Conference for High Performance Computing, Networking, Storage and Analysis  
While it is a wellknown challenge to efficiently parallelize Gauss-Seidel smoother, the most time-consuming kernel in HPCG, our algorithmic and architecture-aware optimizations deliver 95% and 68% of the  ...  A new sparse high performance conjugate gradient benchmark (HPCG) has been recently released to address challenges in the design of sparse linear solvers for the next generation extreme-scale computing  ...  -Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.  ... 
doi:10.1109/sc.2014.82 dblp:conf/sc/ParkSVHKLPLD14 fatcat:ktiisywie5hhznon5qc2tuydoa

High Performance Fortran for aerospace applications

Piyush Mehrotra, Hans Zima
2001 Parallel Computing  
We focus on the data structures and computational structures used in these codes and on the high-level strategies that can be expressed in HPF to optimally exploit the parallelism in these algorithms.  ...  This paper focuses on the use of High Performance Fortran (HPF) for important classes of algorithms employed in aerospace applications.  ...  Unstructured grid flow solvers generally use a finite element approach to spatially discretize the domain using piecewise linear flux functions over each individual triangle in 2D or tetrahedra in 3D.  ... 
doi:10.1016/s0167-8191(00)00073-9 fatcat:iwexaqc67rel3m3ujzdtuzavqa

Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN

G. E. Hammond, P. C. Lichtner, R. T. Mills
2014 Water Resources Research  
To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport  ...  code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer.  ...  Parallelism is derived from a domain decomposition approach, in which the spatial simulation domain is decom-posed into nonoverlapping, contiguous subdomains, each of which is assigned to one parallel  ... 
doi:10.1002/2012wr013483 pmid:25506097 pmcid:PMC4257577 fatcat:cdwcpfmkhjgv5nofkavjmwcasm

Discretization and solution of elliptic PDEs-a digital signal processing approach

C.-C.J. Kuo, B.C. Levy
1990 Proceedings of the IEEE  
Whereas conventional PDE analysis techniques rely on matrix analysis and on a space-domain point of view to study the performance of solution methods, the DSP approach described here relies on frequency  ...  In the area of solution methods, we focus on methods applicable to self-adjoint positive definite elliptic PDEs.  ...  However, when relaxation is performed on the coarse grid, the majority of the processors become idle.  ... 
doi:10.1109/5.60919 fatcat:r7bjwtk6ejhvzl2jancjpputzu

D6.4: Report on approaches to Petascaling

Mohammad Jowkar, Carlo Cavazzoni, Xu Guo, Giorgos Goumas
2009 Zenodo  
time of WP6.  ...  Furthermore each application has been ported and optimized to several different architectures to get a better understanding of the suitability of the applications on different architectures and vice versa  ...  The code authors try to keep a uniform format throughout the application. Generally useful comments are found in most parts of the code, but mostly in German.  ... 
doi:10.5281/zenodo.6546112 fatcat:rsmdzoeqbbbdzoe2zkx3czi2ry

Parallel computing works!

Marcin Paprzycki
1996 IEEE Parallel & Distributed Technology Systems & Applications  
Nearby vortices are strongly coupled computationally, so it makes sense to assign them to the same processor. Binary bisection is used in the host to spatially decompose the domain.  ...  For a one-dimensional code on a hypercube, nearest neighbor sub-domains are assigned to nearest neighbor processors.  ...  Slm89 is designed to process a so-called 'mass raid' scenario, in which a few hundred primary threats are launched within a one to two minute time window, together with about 40-60 secondary, anti-sateUite  ... 
doi:10.1109/mpdt.1996.7102339 fatcat:r6x46zu2pfd4rin4qpzt7ftqqq

High-Performance Computing: Dos and Don'ts [chapter]

Guillaume Houzeaux, Ricard Borrell, Yvan Fournier, Marta Garcia-Gasulla, Jens Henrik Göbbert, Elie Hachem, Vishal Mehta, Youssef Mesri, Herbert Owen, Mariano Vázquez
2018 Computational Fluid Dynamics - Basic Instruments and Applications in Science  
In UMA systems, the memory system is common to all the processors and this means that there is just one memory controller that can only serve one petition at the same time; when having several cores issuing  ...  On the other hand, NUMA nodes partition the memory among the different processors; although the main memory is seen as a whole, the access time depends on the memory location relative to the processor  ...  Acknowledgements Part of the research developments and results presented in this chapter were funded by: the European Union's Horizon 2020 Programme (2014-2020) and from Brazilian Ministry of Science,  ... 
doi:10.5772/intechopen.72042 fatcat:tyqy2hds2ratpgbv7up7ouocwq

D6.5: Report on Porting and Optimisation of applications

Sebastian von Alfthan, Giorgos Goumas, Olli-Pekka Lehto, Pekka Manninen, Mohammad Jowkar, Harald Klimach
2009 Zenodo  
Optimisation techniques are techniques for improving the performance of applications on a node-level.  ...  This document reports the optimisation and porting of applications in the PRACE application benchmark suite (PABS) to the PRACE-WP7 prototype machines.  ...  The main advantage of multigrid is that it accelerates the convergence of a base iterative method by correcting, from time to time, the solution globally by solving a coarse problem.  ... 
doi:10.5281/zenodo.6546114 fatcat:7qf7f6tnzvgjbdvyttbgtdroye
« Previous Showing results 1 — 15 out of 37 results