8,456 Hits in 3.0 sec

Implementation of GPU-FFT into Planewave Based First Principles Calculation Method

Hidekazu TOMONO, Masaru AOKI, Toshiaki IITAKA, Kazuo TSUMURAYA
2011 Journal of Computational Science and Technology  
The use of the multi-CPU system with the GPU FFT accelerates by 2.2 f , where f is the acceleration factor of the multi-CPU system.  ...  The single precision GPU calculation is implementable in any self-consistent electronic structure code, except for the eigensolver part in the DFT codes.  ...  Acknowledgments An earlier stage of the numerical calculations on the PWscf(FFTW) was carried out on Altix3700 BX2 at YITP (Yukawa Institute for Theoretical Physics) in Kyoto University and on SCore system  ... 
doi:10.1299/jcst.5.89 fatcat:t2rogulddzhszlumzmmp54w3be

On the Use of GPUs in Density Functional Theory Atomistic Simulations: A Case of Acceleration Success

Laura Escorihuela, Benjamí Martorell
2017 International Journal of Earth & Environmental Sciences  
In this mini-review we have evaluated which factors are crucial to obtain an appropriate acceleration in the process of moving CPU codes to their GPU version: memory transfer, work flows and CPU/GPU ratio  ...  Accelerations up to 20-40 times the pure CPU version of the DFT code have been achieved. This makes that the additional cost of GPUs cards is less than the price/performance obtained.  ...  As previously commented, this step can be extremely accelerated with the combination of CPU and GPUs computation.  ... 
doi:10.15344/2456-351x/2017/133 fatcat:lwrfmydn3rce3fpcwdczx6zqli

Heterogeneous Programming and Optimization of Gyrokinetic Toroidal Code and Large-Scale Performance Test on TH-1A [chapter]

Xiangfei Meng, Xiaoqian Zhu, Peng Wang, Yang Zhao, Xin Liu, Bao Zhang, Yong Xiao, Wenlu Zhang, Zhihong Lin
2013 Lecture Notes in Computer Science  
The GPU version of the GTC code was benchmarked on up to 3072 nodes of the Tianhe-1A supercomputer, which shows about 2x-3x overall speedup comparing NVIDIA M2050 GPUs to Intel Xeon X5670 CPUs.  ...  In this work, we discuss the porting to the GPU platform of the latest production version of the Gyrokinetic Torodial Code (GTC), which is a petascale fusion simulation code using particle-in-cell method  ...  GPU Acceleration To simulate electron turbulence, we need many subcycles to track the fast electron motion [6, 7] .  ... 
doi:10.1007/978-3-642-38750-0_7 fatcat:c6apqscsgzdpdlolh7xpszvm2e

High-Performance Computing with Accelerators

Volodymyr Kindratenko, Robert Wilhelmson, Robert Brunner, Todd J. Martïnez, Wen-mei Hwu
2010 Computing in science & engineering (Print)  
, embedded computing • Graphics Processing Units (GPUs) -Desktop graphics accelerators • Physics Processing Units (PPUs) -Desktop games accelerators • Sony/Toshiba/IBM Cell Broadband Engine  ...  annotations to enable auto parallelization GMAC -Designed to reduce accelerator use barrier • Unified CPU / GPU Address Space: • Same CPU and GPU address • Customizable implicit data transfers  ... 
doi:10.1109/mcse.2010.88 fatcat:h62wjzxikfb7hor3ezt3d5pkei

Optimizing Gpaw On Gpus

Martti Louhivuori
2013 Zenodo  
bene ts of GPU acceleration.  ...  While this is true in many cases for small-scale computational problems that can be solved using the processing power of a single computing unit, the ecient usage of multiple GPUs in parallel over multiple  ...  Since communication between computing units is handled by the CPU, the latency of data transfer between the CPU and GPU memories means that one can not do frequent propagation of data among the parallel  ... 
doi:10.5281/zenodo.831468 fatcat:uvue3xabuzco3ka3uw3pyatrki

Portable acceleration of materials modelling software: CASTEP, GPUs and OpenACC

Matthew Smith, Arjen Tamerus, Phil Hasnip
2022 Computing in science & engineering (Print)  
We present work to port the CASTEP first-principles materials modelling program to accelerators using OpenACC.  ...  We discuss the challenges and opportunities presented by GPU architectures in particular, and the approach taken in the CASTEP OpenACC port.  ...  For more information, see This article has been accepted for publication in a future issue of this journal, but has not been fully edited.  ... 
doi:10.1109/mcse.2022.3141714 fatcat:gms4lyvhvbbkjll4xnft5frxfu

Acceleration of Large-Scale Electronic Structure Simulations with Heterogeneous Parallel Computing [chapter]

Oh-Kyoung Kwon, Hoon Ryu
2018 High Performance Parallel Computing [Working Title]  
Extending our previous work with Intel Knights Corner coprocessors [12] to the area of GPU computing, this work delivers practical information for technical details that are employed to accelerate empirical  ...  that graphical processing unit (GPU) devices help in saving computing costs in terms of time and energy consumption.  ...  Authors acknowledge the extensive use of KISTI Accelerator Test-bed (KAT) computing resources that are supported by Korea Institute of Science and Technology Information.  ... 
doi:10.5772/intechopen.80997 fatcat:7p367z2konfsbawovb6j74246y

Acceleration of a QM/MM-QMC simulation using GPU

Yutaka Uejima, Tomoharu Terashima, Ryo Maezono
2011 Journal of Computational Chemistry  
The accelerated computational nodes mounting GPU are combined to form a hybrid MPI cluster on which we confirmed the performance linearly scales to the number of nodes.  ...  We accelerated an ab-initio molecular QMC calculation by using GPGPU. Only the bottle-neck part of the calculation is replaced by CUDA subroutine and performed on GPU.  ...  The data is large but read-only, so the data transfer to GPU is required only once at the beginning of a run, not consuming computational cost relative to the whole CPU time.  ... 
doi:10.1002/jcc.21809 pmid:21541960 fatcat:zuazlvphrrcx3pckk2jsn4c52a

Particle-based Semiconductor Device Simulation Accelerated by GPU computing

Akito Suzuki, Takefumi Kamioka, Yoshinari Kamakura, Takanobu Watanabe
2015 Journal of Advanced Simulation in Science and Engineering  
We demonstrate that the parallel computing with graphic processing unit (GPU) effectively accelerates a particle-based carrier transport simulation called EMC/MD method.  ...  The EMC/MD simulation powered by GPU computing is a useful tool to investigate the statistical variability analysis of nano-scale transistors.  ...  T2K-Tsukuba in Center for Computational Sciences of Tsukuba University.  ... 
doi:10.15748/jasse.2.211 fatcat:3clndts3oncixgmdgv5enyccgm

Decreasing Time Consumption of Microscopy Image Segmentation Through Parallel Processing on the GPU [chapter]

Joris Roels, Jonas De Vylder, Yvan Saeys, Bart Goossens, Wilfried Philips
2016 Lecture Notes in Computer Science  
We find that the overall GPU speedup depends on three major factors: 1) the coarse-grained parallelism of the algorithm, 2) the size of the data and 3) the computation/memory transfer ratio.  ...  The computational performance of graphical processing units (GPUs) has improved significantly.  ...  However, because of the high amount of pixel-wise image operations, parallel computing seems computationally interesting. 4 Accelerating programs using the GPU Accelerating SLIC superpixel segmentation  ... 
doi:10.1007/978-3-319-48680-2_14 fatcat:653iipat2nhfrmyawuxefnui7u

GPGPU for orbital function evaluation with a new updating scheme [article]

Yutaka Uejima, Ryo Maezono
2012 arXiv   pre-print
We accelerated an ab-initio QMC electronic structure calculation by using GPGPU.  ...  The bottleneck of the calculation for extended solid systems is replaced by CUDA-GPGPU subroutine kernels which build up spline basis set expansions of electronic orbital functions at each Monte Carlo  ...  Acceleration of Slater Determinant Updating The bottleneck operation of updating of orbital functions, which are computed by the GPU, gives a system size dependence of O(N 2 ) [6] .  ... 
arXiv:1204.1121v2 fatcat:d4eoasvn4revndmemwr2niao7m

GPU Tuning for First-Principle Electronic Structure Simulations [chapter]

Yue Wu, Weile Jia, Lin-Wang Wang, Weiguo Gao, Long Wang, Xuebin Chi
2013 Lecture Notes in Earth System Sciences  
With increasing demands on hardware in quantum chemistry calculations, modern Graphical Processing Units (GPUs) have great potential meeting the resources of high performance computing.  ...  In this paper we investigate the possibility to accelerate the planewave pseudopotential code PEtot on CUDA architecture.  ...  Driven by the motivation to accelerate the computing speed of this part, we replaced the original level-3 blas routines with GPU implementations.  ... 
doi:10.1007/978-3-642-16405-7_14 fatcat:siiw3gadcjavjoh4nzyn4zpqhu

Implementation of GPU-accelerated back projection for EPR imaging

Zhiwei Qiao, Gage Redler, Boris Epel, Yuhua Qian, Howard Halpern
2015 Journal of X-Ray Science and Technology  
The GPU implementation results in acceleration by over a factor of 200 overall and by over a factor of 3500 if only the computing time is considered.  ...  Some important experiences regarding the implementation of GPU-accelerated backprojection for EPRI are summarized.  ...  However, the GPU backprojection computing time (GS method not including transfer time) is not proportional to the size of the object.  ... 
doi:10.3233/xst-150498 pmid:26410654 pmcid:PMC4825055 fatcat:kd6i752syzbm5mygivx36obqcq

CUDA accelerated robot localization and mapping

Haiyang Zhang, Fred Martin
2013 2013 IEEE Conference on Technologies for Practical Robot Applications (TePRA)  
We present a method to accelerate robot localization and mapping by using CUDA (Compute Unified Device Architecture), the general purpose parallel computing platform on NVIDIA GPUs.  ...  Because computations on the particles are independent of each other in this algorithm, CUDA acceleration should be highly effective.  ...  So, to explore the massive parallel computing power of GPU, we focused on accelerating this step by using CUDA.  ... 
doi:10.1109/tepra.2013.6556350 dblp:conf/tepra/ZhangM13 fatcat:fz5f3d4bxbbxlpfblh23bowdza

Rethinking computer architecture for throughput computing

Wen-mei W. Hwu
2013 2013 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS)  
GPU card (or other Accelerator cards) CPU DMA Device Memory Network I/O Disk I/O SAMOS 2013 Desirable Data Transfer Behavior Main Memory (DRAM) GPU card (or other Accelerator cards  ...  ) CPU DMA Device Memory Network I/O Disk I/O SAMOS 2013 Actual Data Transfer Behavior Main Memory (DRAM) GPU card (or other Accelerator cards) CPU DMA Device Memory Network  ... 
doi:10.1109/samos.2013.6621096 dblp:conf/samos/Hwu13 fatcat:nh7fcnab5zcl7f4ovb42yfh7yq
« Previous Showing results 1 — 15 out of 8,456 results