Filters








62 Hits in 6.1 sec

Towards High Performance Relativistic Electronic Structure Modelling: The EXP-T Program Package [article]

Alexander V. Oleynichenko, Andréi Zaitsevskii, Ephraim Eliav
2020 arXiv   pre-print
Modern challenges arising in the fields of theoretical and experimental physics require new powerful tools for high-precision electronic structure modelling; one of the most perspective tools is the relativistic  ...  The software developed allows to achieve a completely new level of accuracy for prediction of properties of atoms and molecules containing heavy and superheavy nuclei.  ...  ACKNOWLEDGEMENTS Authors are grateful to T. A. Isaev, S. V. Kozlov, L. V. Skripnikov, A. V. Stolyarov and L. Visscher for fruitful discussions.  ... 
arXiv:2004.03682v1 fatcat:343huah5qraovcebd7hxyuddmm

CSSI Frameworks: Scalable Modular Software and Methods for High-Accuracy Materials and Condensed Phase Chemistry Simulation

Edgar Solomonik
2022 Zenodo  
New library software infrastructure includes automatic differentiation for tensor optimization, as well as to enable execution of sparse kernels on emerging GPU-based supercomputing architectures.  ...  The methods use efficient high-level software abstractions, implemented as Python-level modules within PySCF that leverage the Cyclops library for massively-parallel execution.  ...  (CCSD) Advances in methods and parallel software • better conditioned basis sets and improved low-scaling DF methods • new automation for AD and contraction of tensor networks • multi-GPU-parallel Python  ... 
doi:10.5281/zenodo.6892232 fatcat:zg6bt27pjfd3foi7emhumtqmzm

Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure

Thomas Herault, Yves Robert, George Bosilca, Robert J. Harrison, Cannada A. Lewis, Edward F. Valeev, Jack J. Dongarra
2021 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)  
In this paper, we focus on the critical element of block-sparse tensor algebra, namely binary tensor contraction, and report on an efficient and scalable implementation using the task-focused PaRSEC runtime  ...  High performance of the block-sparse tensor contraction on the Summit supercomputer is demonstrated for synthetic data as well as for real data involved in electronic structure simulations of unprecedented  ...  The CPUonly implementation in MPQC evaluates tensor V on the fly, as needed; due to the lack of publicly-available efficient kernels for direct evaluation of AO integrals on GPUs (such kernels are under  ... 
doi:10.1109/ipdps49936.2021.00062 fatcat:4d5uwxkmkfdatkhfo76vbgcs5a

Implementation of Relativistic Coupled Cluster Theory for Massively Parallel GPU-Accelerated Computing Architectures

Johann V. Pototschnig, Anastasios Papadopoulos, Dmitry I. Lyakh, Michal Repisky, Loïc Halbert, André Severo Pereira Gomes, Hans Jørgen Aa Jensen, Lucas Visscher
2021 Journal of Chemical Theory and Computation  
The code is designed for parallel execution on many compute nodes with optional GPU coprocessing, accomplished via the new ExaTENSOR back end.  ...  In the current work, we thereby focus on exact two-component methods and demonstrate the accuracy and performance of the software.  ...  Some computer codes used in this research (ExaTENSOR, TAL-SH, partly ExaCorr) were developed during the OLCF-4 Center for Accelerated Application Readiness (CAAR) program funded by the US Department of  ... 
doi:10.1021/acs.jctc.1c00260 pmid:34370471 pmcid:PMC8444343 fatcat:mayywerphfb7rhczg5kfn3cs34

GFCCLib: Scalable and Efficient Coupled-Cluster Green's Function Library for Accurately Tackling Many Body Electronic Structure Problems [article]

Bo Peng, Ajay Panyala, Karol Kowalski, Sriram Krishnamoorthy
2021 arXiv   pre-print
The design of the library is focused on a systematically optimal computing strategy to improve its scalability and efficiency.  ...  However, GFCC calculations on scientific computing clusters usually suffer from expensive higher dimensional tensor contractions in the complex space, expensive interprocess communication, and severe load  ...  From the technical viewpoint, the novel interdisciplinary engineering of tensor contractions library (TAMM), MOR algorithm, efficient compression algorithms for two-electron integrals, and GPUs into a  ... 
arXiv:2010.04768v2 fatcat:y6oyvzl76rh2pbioza2he7j6rm

Work stealing for GPU-accelerated parallel programs in a global address space framework

Humayun Arafat, James Dinan, Sriram Krishnamoorthy, Pavan Balaji, P. Sadayappan
2016 Concurrency and Computation  
In the next subsection, we present experimental results using these two work-stealing strategies for the CCSD(T) application.  ...  We observe that such an approach does not lead to the most efficient execution.  ...  We used a recently developed version of the Tensor Contraction Engine [46] to generate the set of tasks for the CCSD(T) computation.  ... 
doi:10.1002/cpe.3747 fatcat:5nu7aqhhijbtfkhd36c5oec33a

GPU code optimization using abstract kernel emulation and sensitivity analysis

Changwan Hong, Aravind Sukumaran-Rajam, Jinsung Kim, Prashant Singh Rawat, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, P. Sadayappan
2018 Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation - PLDI 2018  
ACKNOWLEDGMENTS We thank the reviewers of the paper for valuable feedback that helped improve the paper. This work was supported in part by the National Science Foundation through  ...  synthesize efficient GPU kernels for tensor contractions.  ...  The NWChem suite includes a separate customized GPU kernel for each of 27 tensor contractions for the CCSD(T) method.  ... 
doi:10.1145/3192366.3192397 dblp:conf/pldi/HongSKRKPRS18 fatcat:7ksfhdqttvcslh5ir7dhgfxa5u

Thread-level parallelization and optimization of NWChem for the Intel MIC architecture

Hongzhang Shan, Samuel Williams, Wibe de Jong, Leonid Oliker
2015 Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15  
We found that while straightforward application of OpenMP to the deep loop nests associated with the tensor contractions of CCSD(T) was sufficient in attaining high performance, significant effort was  ...  In this paper we explore augmenting two NWChem modules, triples correction of the CCSD(T) and Fock matrix construction, with OpenMP in order that they might run efficiently on future manycore architectures  ...  Ma et al. studied CCSD(T) performance on several GPU platforms using hybrid CPU-GPU execution [14, 15] .  ... 
doi:10.1145/2712386.2712391 dblp:conf/ppopp/ShanWJO15 fatcat:2tteaps76fhe5kcasimazvzrhu

MemHC: An Optimized GPU Memory Management Framework for Accelerating Many-body Correlation

Qihan Wang, Zhen Peng, Bin Ren, Jie Chen, Robert G. Edwards
2022 ACM Transactions on Architecture and Code Optimization (TACO)  
In contrast, this work discovers a new optimization dimension for many-body correlation by exploring the optimization opportunities among tensor contractions.  ...  Existing optimizations on many-body correlation mainly focus on individual tensor contractions (e.g., cuBLAS libraries and others).  ...  ACKNOWLEDGMENTS The authors would like to thank the anonymous reviewers for making innumerable helpful suggestions and comments.  ... 
doi:10.1145/3506705 fatcat:fy3y5db3ozccfishi5fax3opqe

Accelerating Auxiliary-Field Quantum Monte Carlo Simulations of Solids with Graphical Processing Unit [article]

Fionn D. Malone, Shuai Zhang, Miguel A. Morales
2020 arXiv   pre-print
By exploiting conservation of crystal momentum in the one- and two-electron integrals we show how to efficiently formulate the algorithm to best utilize current GPU architectures.  ...  of Carbon in the diamond structure to within 0.02 eV of the experimental result.  ...  We thank Joonho Lee for his insistence on our reporting of counterpoise corrected finite basis set cohesive energies and for other helpful criticism.  ... 
arXiv:2003.09468v1 fatcat:hlrqsc5klfbdnduobjv6zljej4

Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

Khaled Z. Ibrahim, Evgeny Epifanovsky, Samuel Williams, Anna I. Krylov
2017 Journal of Parallel and Distributed Computing  
These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations.  ...  While based on matrix-matrix multiplication, these libraries are specialized to exploit symmetries in the molecular structure and in electronic interactions, and thus reduce the size of the tensor representation  ...  We would like to thank the anonymous reviewers for providing suggestions to get a better performance of NWChem runs and improve the presentation in this manuscript.  ... 
doi:10.1016/j.jpdc.2017.02.010 fatcat:mcrxnl4b2vaslg7r35sodabn3e

NWChem: Past, Present, and Future [article]

E. Aprà and E. J. Bylaska and W. A. de Jong and N. Govind and K. Kowalski and T. P. Straatsma and M. Valiev and H. J. J. van Dam and Y. Alexeev and J. Anchell and V. Anisimov and F. W. Aquino and R. Atta-Fynn and J. Autschbach and N. P. Bauman and J. C. Becca and D. E. Bernholdt and K. Bhaskaran-Nair and S. Bogatko and P. Borowski and J. Boschen and J. Brabec and A. Bruner and E. Cauët and Y. Chen and G. N. Chuev and C. J. Cramer and J. Daily and M. J. O. Deegan and T. H. Dunning Jr. and M. Dupuis and K. G. Dyall and G. I. Fann and S. A. Fischer and A. Fonari and H. Früuchtl and L. Gagliardi and J. Garza and N. Gawande and S. Ghosh and K. Glaesemann and A. W. Götz and J. Hammond and V. Helms and E. D. Hermes and K. Hirao and S. Hirata and M. Jacquelin and L. Jensen and B. G. Johnson and H. Jónsson and R. A. Kendall and M. Klemm and R. Kobayashi and V. Konkov and S. Krishnamoorthy and M. Krishnan and Z. Lin and R. D. Lins and R. J. Littlefield and A. J. Logsdail and K. Lopata and W. Ma and A. V. Marenich and J. Martin del Campo and D. Mejia-Rodriguez and J. E. Moore and J. M. Mullin and T. Nakajima and D. R. Nascimento and J. A. Nichols and P. J. Nichols and J. Nieplocha and A. Otero de la Roza and B. Palmer and A. Panyala and T. Pirojsirikul and B. Peng and R. Peverati and J. Pittner and L. Pollack and R. M. Richard and P. Sadayappan and G. C. Schatz and W.A. Shelton and D. W. Silverstein and D. M. A. Smith and T. A. Soares and D. Song and M. Swart and H. L. Taylor and G. S. Thomas and V. Tipparaju and D. G. Truhlar and K. Tsemekhman and T. Van Voorhis and Á. Vázquez-Mayagoitia and P. Verma and O. Villa and A. Vishnu and K. D. Vogiatzis and D. Wang and J. H. Weare and M. J. Williamson and T. L. Windus and K. Woliński and A. T. Wong and Q. Wu and C. Yang and Q. Yu and M. Zacharias and Z. Zhang and Y. Zhao and R. J. Harrison
2020 Journal of Chemical Physics   accepted
Specialized computational chemistry packages have permanently reshaped the landscape of chemical and materials science by providing tools to support and guide experimental efforts and for the prediction  ...  and predictive many-body techniques that describe correlated behavior of electrons in molecular and condensed phase systems at different levels of theory.  ...  For example, for the CCSD formulation one obtains Φ|(H N e T 1 +T 2 ) C |Φ = ∆E CCSD (5) Φ a i |(H N e T 1 +T 2 ) C |Φ = 0 , (6) Φ ab i j |(H N e T 1 +T 2 ) C |Φ = 0 , (7) where H N is the electronic Hamiltonian  ... 
doi:10.1063/5.0004997 pmid:32414274 arXiv:2004.12023v2 fatcat:zzz2vczvkjbnjmqz2ssvd6ouku

An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU

Dmitry I. Lyakh
2015 Computer Physics Communications  
An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU.  ...  The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU.  ...  appropriate for an efficient CPU cache utilization or for coalescing global memory accesses on an NVidia GPU.  ... 
doi:10.1016/j.cpc.2014.12.013 fatcat:2d76khh77ran7bnh4a67wnsype

Recent developments in the general atomic and molecular electronic structure system

Giuseppe M. J. Barca, Colleen Bertoni, Laura Carrington, Dipayan Datta, Nuwan De Silva, J. Emiliano Deustua, Dmitri G. Fedorov, Jeffrey R. Gour, Anastasia O. Gunina, Emilie Guidez, Taylor Harville, Stephan Irle (+30 others)
2020 Journal of Chemical Physics  
A discussion of many of the recently implemented features of GAMESS (General Atomic and Molecular Electronic Structure System) and LibCChem (the C++ CPU/GPU library associated with GAMESS) is presented  ...  Many new coupled cluster theory methods have been implemented in GAMESS, as have multiple levels of density functional/tight binding theory.  ...  The members of the GAMESS development group are very grateful for this support.  ... 
doi:10.1063/5.0005188 pmid:32321259 fatcat:w5nb7cnclrgqdl4x6ztbgt25oy

Efficient Primitives for Standard Tensor Linear Algebra

David M. Rogers
2016 Proceedings of the XSEDE16 on Diversity, Big Data, and Science at Scale - XSEDE16  
The execution of the present API achieves peak performance on the same order of magnitude as for vendor-optimized GEMM by utilizing a code generator to output CUDA source code for all computational kernels  ...  Despite their relatively low operation count, we show that these transposition steps can become performance limiting in typical use cases for BLAS on tensors.  ...  tensor framework (CTF) [19] is a similar project aimed at implementing efficient tensor contractions for large tensors stored in distributed global memory.  ... 
doi:10.1145/2949550.2949580 dblp:conf/xsede/Rogers16 fatcat:zfsp4hjqb5cvniz6rcs2cz6j34
« Previous Showing results 1 — 15 out of 62 results