96 Hits in 8.2 sec

GPGPU-based parallel computing applied in the FEM using the conjugate gradient algorithm: a review

Nileshchandra K Pikle, Shailesh R Sathe, Arvind Y Vyavhare
2018 Sadhana (Bangalore)  
Nevertheless, FEM applications have been successfully deployed on GPUs over the last 10 years to achieve a significant performance improvement.  ...  Parallelization of the finite-element method (FEM) has been contemplated by the scientific and high-performance computing community for over a decade.  ...  Some extraordinary strategies favourable to parallelization of FEM on GPU In contrast with the earlier traditional FEM method, which performs assembly, the AF technique bypasses the generation of a global  ... 
doi:10.1007/s12046-018-0892-0 fatcat:igpuqhu6qfcjdo2zm6apnmpzhy

Parallelized Implementation of the Finite Particle Method for Explicit Dynamics in GPU

Jingzhe Tang, Yanfeng Zheng, Chao Yang, Wei Wang, Yaozhi Luo
2020 CMES - Computer Modeling in Engineering & Sciences  
To this end, a GPU-accelerated parallel strategy for the FPM is proposed in this paper.  ...  Using the Compute Unified Device Architecture (CUDA), the GPU implementations of the main tasks of the FPM, such as evaluating and assembling the element equivalent forces and solving the kinematic equations  ...  Performance tests on the speedup ratios for various types of FPM elements are reported to illustrate the improvements in performance achieved with GPU parallelization.  ... 
doi:10.32604/cmes.2020.08104 fatcat:et6qen67tnd3hf3znlol3o5v3i

HipBone: A performance-portable GPU-accelerated C++ version of the NekBone benchmark [article]

Noel Chalmers, Abhishek Mishra, Damon McDougall, Tim Warburton
2022 arXiv   pre-print
HipBone is a fully GPU-accelerated C++ implementation of the original NekBone CPU proxy application with several novel algorithmic and implementation improvements which optimize its performance on modern  ...  Our optimizations include a conversion to store the degrees of freedom of the problem in assembled form in order to reduce the amount of data moved during the main iteration and a portable implementation  ...  Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.  ... 
arXiv:2202.12477v1 fatcat:nog4oaa6bjdvzgg6qghrmhrfzi

Finite Element Algorithms and Data Structures on Graphical Processing Units

I. Z. Reguly, M. B. Giles
2013 International journal of parallel programming  
We discuss the performance of different approaches in light of the implicit caches on Fermi GPUs and show a speedup over a two-socket 12-core CPU of up to 10 times in the assembly and up to 6 times in  ...  The finite element method (FEM) is one of the most commonly used techniques for the solution of partial differential equations on unstructured meshes.  ...  Related Work Due to the high demand for accelerating finite element methods (FEM) several studies have investigated FEM implementations on GPUs.  ... 
doi:10.1007/s10766-013-0301-6 fatcat:gqp6iqfzzzamla4x5q5k2g2cwy

Introduction to assembly of finite element methods on graphics processors

Cristopher Cecka, Adrian Lew, Eric Darve
2010 IOP Conference Series: Materials Science and Engineering  
We also find that the optimal assembly strategy depends on the order of polynomials used in the finite-element discretization.  ...  Multiple approaches in assembling and solving sparse linear systems with NVIDIA GPUs and the Compute Unified Device Architecture (CUDA) are presented and discussed.  ...  Thus, the assembly, solution, and visualization of a dynamic FEM problem can be performed completely on the GPU. This strategy has already been employed in [12] and [10] with impressive results.  ... 
doi:10.1088/1757-899x/10/1/012009 fatcat:5t4fqgjfmng75iwjv2gmcghafq

GPU acceleration of a non-standard finite element mesh truncation technique for electromagnetics

Jose M. Badia, Adrian Amor-Martin, Jose A. Belloch, Luis Emilio Garcia-Castillo
2020 IEEE Access  
In this paper, we use this kind of device to parallelize FE-IIEE (Finite Element-Iterative Integral Equation Evaluation), a non-standard finite element mesh truncation technique introduced by two of the  ...  The proposed implementation using CUDA applies different optimization techniques to improve performance.  ...  However, there is no significant improvement on performance derived from this strategy.  ... 
doi:10.1109/access.2020.2993103 fatcat:42kpzi7vlbe6pl5hj55uep6g5i

Performance comparison of GPU based Jacobi solvers using CUDA provided synchronization methods

Maria Aslam, Dr. Omer Riaz, Dr. Shahzad Mumtaz, Ali Daniyal Asif
2020 IEEE Access  
The GPU has achieved a max speedup of 46 times using GTX 1060 and 60 times using Quadro P4000 with double precision computations when compared with sequential implementation on Core-i7 8750H.  ...  During this work, parallel implementation of finite element method (FEM) using Poisson's equation on shared memory architecture as well as on GPGPUs has been observed to identify computationally most expensive  ...  Thus, when there are high number of threads, the number of registers available per thread is restricted, which is one of the major reason of why high occupancy may actually affect performance of GPU •  ... 
doi:10.1109/access.2020.2973669 fatcat:w7vgp2cv7fam3be4zrpr4tmuwu

An Iterative Parallel Solver in GPU Applied to Frequency Domain Linear Water Wave Problems by the Boundary Element Method

Jorge Molina-Moya, Alejandro Enrique Martínez-Castro, Pablo Ortiz
2018 Frontiers in Built Environment  
In this paper a parallel iterative solver based on the Generalized Minimum Residual Method (GMRES) with complex-valued coefficients is explored, with applications to the Boundary Element Method (BEM).  ...  The solver is designed to be executed in a GPU (Graphic Processing Unit) device, exploiting its massively parallel capabilities.  ...  , with number of threads ≥ N 23: end if 24: Copy the computed vector from GPU to CPU 25: Free all the global memory in GPU Algorithm 3 : 3 Matrix-Vector Product -GPU product Matrix-Vector Product  ... 
doi:10.3389/fbuil.2018.00069 fatcat:qrth76ukdbdp7bwd7fb34tnvau

A GPU-based caching strategy for multi-material linear elastic FEM on regular grids

Christian Schlinkmann, Michael Roland, Christian Wolff, Patrick Trampert, Philipp Slusallek, Stefan Diebels, Tim Dahmen, Anotida Madzvamuse
2020 PLoS ONE  
In this study, we present a novel strategy to the method of finite elements (FEM) of linear elastic problems of very high resolution on graphic processing units (GPU).  ...  The approach exploits regularities in the system matrix that occur in regular hexahedral grids to achieve cache-friendly matrix-free FEM.  ...  Thorsten Tjardes from the Witten-Herdecke University for providing the clinical CT data used in this paper.  ... 
doi:10.1371/journal.pone.0240813 pmid:33125404 fatcat:mxmtuq7z7bc4tgj5pqnjmrsc2e

Acceleration of tensor-product operations for high-order finite element methods [article]

Kasia Świrydowicz, Noel Chalmers, Ali Karakus, Timothy Warburton
2017 arXiv   pre-print
This paper is devoted to GPU kernel optimization and performance analysis of three tensor-product operators arising in finite element methods.  ...  We give a guided overview of optimization strategies and we present a performance model that allows us to compare the efficacy of these optimizations against an empirically calibrated roofline.  ...  Acknowledgements This research was supported in part by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S.  ... 
arXiv:1711.00903v2 fatcat:agrgtegmybdvrku52dphq4vjea

A matrix-free high-order solver for the numerical solution of cardiac electrophysiology [article]

Pasquale Claudio Africa, Matteo Salvador, Paola Gervasio, Luca Dede', Alfio Quarteroni
2022 arXiv   pre-print
We combine sum-factorization with vectorization, thus allowing for a very efficient use of high-order polynomials in a high performance computing framework.  ...  We also implement a matrix-free Geometric Multigrid preconditioner that entails better performance in terms of linear solver iterations than state-of-the-art matrix-based Algebraic Multigrid preconditioners  ...  PCA and MS developed the computer code, performed the numerical simulations and post-processed the numerical results.  ... 
arXiv:2205.05136v2 fatcat:6yme5qanhvantoecys233bcq6q

An explicit dynamics GPU structural solver for thin shell finite elements

A. Bartezzaghi, M. Cremonesi, N. Parolini, U. Perego
2015 Computers & structures  
, one of the main producers of graphics cards, and of improved, highly performing GPU (Graphics Processing Unit) boards, GPGPU (General Purpose programming on GPU) is attracting increasing interest in  ...  A speedup of more than 40 is achieved with respect of state-of-the art commercial codes running on CPU, obtaining real-time simulations in some cases, on commodity hardware.  ...  The authors gratefully acknowledge the support of NVIDIA Corporation with the donation of the Tesla K20 GPU used for this research.  ... 
doi:10.1016/j.compstruc.2015.03.005 fatcat:q3k5hks2qrdhfdteah4pfvkbs4

GPU-based acceleration of computational electromagnetics codes

Danilo De Donno, Alessandra Esposito, Giuseppina Monti, Luca Catarinucci, Luciano Tarricone
2012 International journal of numerical modelling  
The solution of large and complex electromagnetic (EM) problems often leads to a substantial demand for high-performance computing resources and strategies.  ...  In the last decades, graphics processing units (GPUs) have gained popularity in scientific computing as a lowcost and powerful parallel architecture.  ...  Such code is appreciated for its efficiency due to advanced optimization strategies. Element-wise product, axpy, and axpby routines were implemented from scratch.  ... 
doi:10.1002/jnm.1849 fatcat:bzwa4k5zvjeb7hhanifzjegfx4

DFT-FE 1.0: A massively parallel hybrid CPU-GPU density functional theory code using finite-element discretization [article]

Sambit Das, Phani Motamarri, Vishal Subramanian, David M. Rogers, Vikram Gavini
2022 arXiv   pre-print
This work involves improvements in the real-space formulation – via an improved treatment of the electrostatic interactions that substantially enhances the computational efficiency – as well high-performance  ...  Notably, owing to the parallel-scaling of the GPU implementation, we obtain wall-times of 80-140 seconds for full ground-state calculations, with stringent accuracy, on benchmark systems containing 6,000  ...  In the case of the CF step, which involves sparse-dense matrix products, we achieved a high GPU throughput performance of 54% of the V100 GPU FP64 peak in a multi-node job with a total of 24 GPUs for a  ... 
arXiv:2203.07820v2 fatcat:kjqywltt5zbbhfw43xuniaowki

Real-time Simulation and Optimization of Elastic Aircraft Vehicle Based on Multi-GPU Workstation

Binxing Hu, Xingguo Li
2019 IEEE Access  
Meanwhile, an innovative parallel algorithm of element stiffness matrix based on finite element model is designed in GPU architecture.  ...  of dual GPUs, enabling the real-time simulation of the flexible aircraft with 1200 elements within 20ms.  ...  With the gradual improvement of GPU performance and the abundance of interfaces and libraries, OpenMP has become a subordinate position in high performance computing.  ... 
doi:10.1109/access.2019.2946684 fatcat:v2arypyusbgdtglub5t2b5dyoi
« Previous Showing results 1 — 15 out of 96 results