77 Hits in 3.0 sec

GPU-enabled efficient executions of radiation calculations in climate modeling

Sai Kiran Korwar, Sathish Vadhiyar, Ravi S Nanjundiah
2013 20th Annual International Conference on High Performance Computing  
By exploiting the parallelism offered by the GPU and using asynchronous executions on the CPU and GPU, we obtain a speed-up of about 26× for the routine radabs and about 5.6× for routine radcswmx.  ...  This work attempts to extract the performance of GPUs to enable faster execution of CESM and obtain better model throughput.  ...  CONCLUSIONS AND FUTURE WORK The use of GPUs to accelerate applications that are embarrassingly parallel leads to significant performance improvements.  ... 
doi:10.1109/hipc.2013.6799141 dblp:conf/hipc/KorwarVN13 fatcat:jrnyadx3wraazl6rzqnhtfue7m

A Parallel Hybrid Intelligence Algorithm for Solving Conditional Nonlinear Optimal Perturbation to Identify Optimal Precursors of North Atlantic Oscillation

Bin Mu, Jing Li, Shijin Yuan, Xiaodan Luo, Guokun Dai
2019 Nonlinear Processes in Geophysics Discussions  
It has a profound influence on the strength of westerly winds as well as the storm tracks in North Atlantic, thus affecting winter climate in Northern Hemisphere.  ...  In this paper, conditional nonlinear optimal perturbation (CNOP), which has been widely used in research on the optimal precursor (OPR) of climatic event, is adopted to investigate which kind of initial  ...  Since GPU is suitable for parallel computing on a large scale, it can significantly improve the execution performance of climate models.  ... 
doi:10.5194/npg-2019-25 fatcat:odynt2v5ujbxfgl3e74lqdsdme

Optimizing high-resolution Community Earth System Model on a heterogeneous many-core supercomputing platform

Shaoqing Zhang, Haohuan Fu, Lixin Wu, Yuxuan Li, Hong Wang, Yunhui Zeng, Xiaohui Duan, Wubing Wan, Li Wang, Yuan Zhuang, Hongsong Meng, Kai Xu (+32 others)
2020 Geoscientific Model Development  
The refactoring and optimizing efforts have improved the simulation speed of CESM-HR from 1 SYPD (simulation years per day) to 3.4 SYPD (with output disabled) and supported several hundred years of pre-industrial  ...  (CPUs) and six graphics processing units (GPUs) inside one node.  ...  All authors contributed to the improvement of ideas, software testing, experimental evaluation, and paper writing and proofreading. Acknowledgements.  ... 
doi:10.5194/gmd-13-4809-2020 fatcat:i7pxvfrgk5dhljfg5azi3epj6e

Deep and Shallow convections in Atmosphere Models on Intel Xeon Phi Coprocessor Systems [article]

Srinivasan Ramesh, Sathish Vadhiyar, Ravi Nanjundiah, PN Vinayachandran
2017 arXiv   pre-print
We also employ proportional partitioning of independent column computations across both the CPU and coprocessor cores based on the performance ratio of the computations on the heterogeneous resources.  ...  These calculations also present significant load imbalances due to varying cloud covers over different regions of the grid.  ...  ACKNOWLEDGMENTS This project is supported by the Intel R Parallel Computing Centre for Modelling Monsoons and Tropical Climate (IPCC-MMTC), India sponsored by the Intel R Corporation.  ... 
arXiv:1711.00289v1 fatcat:bzxyru5cmzc6vpnlupxxxvwtee

NICEST2 - D4.5: First report on the identified bottlenecks for an efficient usage of Nordic ESMs on EuroHPC

Alok Gupta, Jean Iaquinta, Anne Fouilloux, Oskar Landgren
2021 Zenodo  
Report on the bottlenecks that would hinder the efficient usage of Nordic ESMs on EuroHPC and possible remediation actions (I/Os, adding GPU support, etc.) with clear information on costs in terms of manpower  ...  For instance, being able to organize meetings/hackathons (online or face to face) with both experts from NorESM and EC-EARTH has been highlighted as an important requirements by those involved in the GPU  ...  the performance of the code on GPUs.  ... 
doi:10.5281/zenodo.4749515 fatcat:hsvwkv2ayfhztaz5qistbdolm4

A Feasibility Study on Porting the Community Land Model onto Accelerators Using Openacc

D. Wang, W. Wu, F. Winkler, W. Ding, O. Hernandez
2014 International Journal of Advanced Computer Science and Applications  
Even it is a non-intensive kernel, on a single 16-core computing node, the performance (based on the actual computation time using one GPU) of OpenACC implementation is 2.3 time faster than that of OpenMP  ...  of data parallelization and the benefit of data movement provided by current implementation of OpenACC.  ...  break all the loop into parallel computation on GPU cores, and copy back these datastreams.  ... 
doi:10.14569/ijacsa.2014.051203 fatcat:as4dqw3pmbg4zjhekpxfbixchm

Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs [article]

Jiannan Tian, Sheng Di, Xiaodong Yu, Cody Rivera, Kai Zhao, Sian Jin, Yunhe Feng, Xin Liang, Dingwen Tao, Franck Cappello
2021 arXiv   pre-print
architectures. (3) We carefully optimize each of cuSZ+ kernels by leveraging state-of-the-art CUDA parallel primitives. (4) We evaluate cuSZ+ using seven real-world HPC application datasets on V100 and  ...  Experiments show cuSZ+ improves the compression throughputs and ratios by up to 18.4X and 5.3X, respectively, over cuSZ on the tested datasets.  ...  The material was supported by the U.S. Department of Energy, Office of Science, under contract DE-AC02-06CH11357.  ... 
arXiv:2105.12912v3 fatcat:a5ayaxi4brdpvjzjqofqkmdfpu

High-Performance Distributed Multi-Model / Multi-Kernel Simulations: A Case-Study in Jungle Computing [article]

Niels Drost, Jason Maassen, Maarten A.J. van Meersbergen, Henri E. Bal, F. Inti Pelupessy, Simon Portegies Zwart, Michael Kliphuis, Henk A. Dijkstra, Frank J. Seinstra
2012 arXiv   pre-print
We make use of the software developed in the Ibis project, which addresses many of the problems faced when running applications on Jungle Computing Systems.  ...  One striking example of applications that can benefit greatly of Jungle Computing Systems are Multi-Model / Multi-Kernel simulations.  ...  The authors wish to kindly thank Vianney Govers and Kees Verstoep for providing support for the LGM and DAS-4, and setting up the LGM/DAS network connection.  ... 
arXiv:1203.0321v1 fatcat:bxi74ww3mvhtjkepflywwm7mhu

CEAZ: Accelerating Parallel I/O Via Hardware-Algorithm Co-Designed Adaptive Lossy Compression [article]

Chengming Zhang, Sian Jin, Tong Geng, Jiannan Tian, Ang Li, Dingwen Tao
2021 arXiv   pre-print
As parallel computers continue to grow to exascale, the amount of data that needs to be saved or transmitted is exploding.  ...  To this end, many previous works have studied using error-bounded lossy compressors to reduce the data size and improve the I/O performance.  ...  We evaluate the impact of update frequency on the final compression ratio. We perform the experiments on both CESM and HACC.  ... 
arXiv:2106.13306v2 fatcat:42fvquu3trcgxncxwdl5izksra

cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data [article]

Jiannan Tian, Sheng Di, Kai Zhao, Cody Rivera, Megan Hickman Fulp, Robert Underwood, Sian Jin, Xin Liang, Jon Calhoun, Dingwen Tao, Franck Cappello
2020 arXiv   pre-print
It also improves the compression ratio by up to 3.48x on the tested data compared with another state-of-the-art GPU supported lossy compressor.  ...  . (2) We develop an efficient customized Huffman coding for the SZ compressor on GPUs. (3) We implement cuSZ using CUDA and optimize its performance by improving the utilization of GPU memory bandwidth  ...  ACKNOWLEDGMENTS This research was supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations -the Office of Science and the National  ... 
arXiv:2007.09625v1 fatcat:f4sq3abcmvehvm4b6wishmudie

SIMD Lossy Compression for Scientific Data [article]

Griffin Dube, Jiannan Tian, Sheng Di, Dingwen Tao, Jon Calhoun, Franck Cappello
2022 arXiv   pre-print
However, some HPC systems and applications do not use GPUs, and could still benefit from the fine-grained parallelism of this method.  ...  Recent work proposes a parallel dual prediction/quantization algorithm for GPUs which removes these dependencies.  ...  In addition, SZ takes advantage of on-node parallelization such as OpenMP, GPUs [12] . C. SIMD Many operations in HPC applications such as linear algebra exhibit large amounts of data parallelism.  ... 
arXiv:2201.04614v1 fatcat:sbv7n4lnjbhj3kzq7fg5fr5mby

Runtime Tracing of the Community Earth System Model: Feasibility Study and Benefits

Jens Domke, Dali Wang
2012 Procedia Computer Science  
The Community Earth System Model (CESM) is one of US's leading earth system modeling frameworks, which has decades of development history and was embraced by a large, active user community.  ...  Then we present an offline global community land model simulation within the CESM framework to demonstrate the procedure of runtime tracing of CESM using the Vampir toolset.  ...  Acknowledgments The authors thank the Vampir Team of the Center for Information Services and High Performance Computing  ... 
doi:10.1016/j.procs.2012.04.213 fatcat:tdbgofrg5vgovade5evwfghwzy

Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs [article]

Cody Rivera, Sheng Di, Jiannan Tian, Xiaodong Yu, Dingwen Tao, Franck Cappello
2022 arXiv   pre-print
Specifically, we take full advantage of CUDA GPU architectures by using shared memory on decoding/writing phases, online tuning the amount of shared memory to use, improving memory access patterns, and  ...  In this work, we aim to significantly improve the Huffman decoding performance for cuSZ, thus improving the overall decompression performance in turn.  ...  The material was supported by the U.S. Department of Energy, Office of Science and Office of Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357.  ... 
arXiv:2201.09118v2 fatcat:4jzndm2n4ngirjvogrrzsu4exu

An automatic performance model-based scheduling tool for coupled climate system models

Nan Ding, Wei Xue, Zhenya Song, Haohuan Fu, Shiming Xu, Weimin Zheng
2018 Journal of Parallel and Distributed Computing  
As a petascale supercomputer, Tianhe-1A features a massively parallel processor (MPP) architecture of hybrid CPU-GPU computing. A proprietary 295 300 nodes for a total of 192 cores.  ...  We test the performance improvement of B1850 f19 g16 on Tianhe-1A and HP cluster. 315 We profile the case B1850 f19 g16 at two parallelisms (12 and 96 processes) to determine the model parameters.  ... 
doi:10.1016/j.jpdc.2018.01.002 fatcat:lasxgh5a5jbnvak3gmmcnnbube

GPU-RRTMG_SW: Accelerating a Shortwave Radiative Transfer Scheme on GPU

Zhenzhen Wang, Yuzhu Wang, Xiaocong Wang, Fei Li, Chen Zhou, Hangtian Hu, Jinrong Jiang
2021 IEEE Access  
Parallel computing technology based on graphics processing units (GPUs) has the characteristics of high parallelism, multi-threaded and multi-core processors, and high memory bandwidth.  ...  With this technology, transferring data between a CPU and GPU is sped up by approximately 86%. When the total performance of CC- RRTMG_SW on one K20 GPU is analyzed, there is a 44% improvement.  ... 
doi:10.1109/access.2021.3087507 fatcat:kwrg4nmbxvhc3lqfhynrjw7wlm
« Previous Showing results 1 — 15 out of 77 results