Filters








4,540 Hits in 4.9 sec

Memory-Constrained Data Locality Optimization for Tensor Contractions [chapter]

Alina Bibireata, Sandhya Krishnan, Gerald Baumgartner, Daniel Cociorva, Chi-Chung Lam, P. Sadayappan, J. Ramanujam, David E. Bernholdt, Venkatesh Choppella
2004 Lecture Notes in Computer Science  
In this paper, we address the memory-constrained data-locality optimization problem in the context of this class of computations.  ...  The accurate modeling of the electronic structure of atoms and molecules involves computationally intensive tensor contractions over large multidimensional arrays.  ...  Acknowledgments We thank the National Science Foundation for its support of this research through the Information Technology Research program (CHE-0121676 and CHE-0121706), NSF grants CCR-0073800 and EIA  ... 
doi:10.1007/978-3-540-24644-2_7 fatcat:ixthi66h3rgr5onjrwdobvz2oe

Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions

Qingda Lu, Xiaoyang Gao, Sriram Krishnamoorthy, Gerald Baumgartner, J. Ramanujam, P. Sadayappan
2012 Journal of Parallel and Distributed Computing  
order [44] and loop fusion [43, 42] for reducing memory requirements, space-time trade-off optimization [15] , data locality optimization, which combines loop fusion and tiling for reducing disc-to-memory  ...  The order of indices of the intermediate tensors is not constrained.  ...  GEMM implementations can achieve such a high performance, since they use tiling for all levels in the memory hierarchy to optimize temporal locality.  ... 
doi:10.1016/j.jpdc.2011.09.006 fatcat:xxgixibzbzhr3o6sw34sympcpa

Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms [chapter]

Sandhya Krishnan, Sriram Krishnamoorthy, Gerald Baumgartner, Daniel Cociorva, Chi-Chung Lam, P. Sadayappan, J. Ramanujam, David E. Bernholdt, Venkatesh Choppella
2003 Lecture Notes in Computer Science  
This paper describes an approach to synthesis of efficient out-of-core code for a class of imperfectly nested loops that represent tensor contraction computations.  ...  Tensor contraction expressions arise in many accurate computational models of electronic structure.  ...  We would also like to thank the Ohio Supercomputer Center (OSC) for the use of their computing facilities.  ... 
doi:10.1007/978-3-540-24596-4_44 fatcat:uf7sndsr6vbjnlq5hvdyeilkda

Practical Loop Transformations for Tensor Contraction Expressions on Multi-level Memory Hierarchies [chapter]

Wenjing Ma, Sriram Krishnamoorthy, Gagan Agrawal
2011 Lecture Notes in Computer Science  
Optimizing applications for such architectures requires careful management of the data movement across all these levels.  ...  In this paper, we focus on the problem of mapping tensor contractions to memory hierarchies with more than two levels, specifically addressing placement of memory allocation and data movement statements  ...  The tile sizes for the data movement placement can be determined by solving this non-linear constrained optimization problem.  ... 
doi:10.1007/978-3-642-19861-8_15 fatcat:eyf4rajoercptjnsktt3adiumi

Analytical cache modeling and tilesize optimization for tensor contractions

Rui Li, Aravind Sukumaran-Rajam, Richard Veras, Tze Meng Low, Fabrice Rastello, Atanas Rountev, P. Sadayappan
2019 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '19  
Tiling and loop permutation are key techniques for improving data locality.  ...  In this paper we provide an analytical model based approach to multi-level tile size optimization and permutation selection for tensor contractions.  ...  ACKNOWLEDGMENTS We thank the reviewers for their valuable feedback. This work was supported in part by the U.S.  ... 
doi:10.1145/3295500.3356218 dblp:conf/sc/LiSVLRRS19 fatcat:376oo5hdqbehtfell2khswwv7i

Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver

Sandhya Krishnan, Sriram Krishnamoorthy, Gerald Baumgartner, Chi-Chung Lam, J. Ramanujam, P. Sadayappan, Venkatesh Choppella
2006 Journal of Parallel and Distributed Computing  
We address the problem of efficient out-of-core code generation for a special class of imperfectly nested loops encoding tensor contractions.  ...  These loops operate on arrays too large to fit in physical memory. The problem involves determining optimal tiling and placement of disk I/O statements.  ...  Benjamin Wah and Yi Xin Chen of the University of Illinois for their significant help with using the Discrete Constrained Search (DCS) Solver.  ... 
doi:10.1016/j.jpdc.2005.06.017 fatcat:vbkqcart6jeyrals3umw2uqohe

A Quantum-Inspired Tensor Network Algorithm for Constrained Combinatorial Optimization Problems

Tianyi Hao, Xuxin Huang, Chunjing Jia, Cheng Peng
2022 Frontiers in Physics  
In this paper, we propose a quantum-inspired tensor-network-based algorithm for general locally constrained combinatorial optimization problems.  ...  Our algorithm constructs a Hamiltonian for the problem of interest, effectively mapping it to a quantum problem, then encodes the constraints directly into a tensor network state and solves the optimal  ...  The code used to generate the data presented in this study can be publicly accessed on GitHub at [30] .  ... 
doi:10.3389/fphy.2022.906590 fatcat:numnghitqbcjzl4uf5cgxbbtju

A quantum-inspired tensor network method for constrained combinatorial optimization problems [article]

Tianyi Hao and Xuxin Huang and Chunjing Jia and Cheng Peng
2022 arXiv   pre-print
In this paper, we propose a quantum inspired algorithm for general locally constrained combinatorial optimization problems by encoding the constraints directly into a tensor network state.  ...  Combinatorial optimization is of general interest for both theoretical study and real-world applications.  ...  The code used to generate the data presented in this study can be publicly accessed on GitHub at [5] .  ... 
arXiv:2203.15246v1 fatcat:y7iglyelvvde5hgupub4pezywy

Cross-scale efficient tensor contractions for coupled cluster computations through multiple programming model backends

Khaled Z. Ibrahim, Evgeny Epifanovsky, Samuel Williams, Anna I. Krylov
2017 Journal of Parallel and Distributed Computing  
These calculations are dominated by a sequence of tensor contractions, motivating the development of numerical libraries for such operations.  ...  We achieve up to 240× speedup compared with the optimized shared memory implementation of Libtensor.  ...  We would like to thank the anonymous reviewers for providing suggestions to get a better performance of NWChem runs and improve the presentation in this manuscript.  ... 
doi:10.1016/j.jpdc.2017.02.010 fatcat:mcrxnl4b2vaslg7r35sodabn3e

Lifetime-based Method for Quantum Simulation on a New Sunway Supercomputer [article]

Yaojian Chen, Yong Liu, Xinmin Shi, Jiawei Song, Xin Liu, Lin Gan, Chu Guo, Haohuan Fu, Dexun Chen, Guangwen Yang
2022 arXiv   pre-print
Faster classical simulation becomes essential for the validation of quantum computer, and tensor network contraction is a widely-applied simulation approach.  ...  Due to the memory limitation, slicing is adopted to help cutting down the memory size by reducing the tensor dimension, which also leads to additional computation overhead.  ...  ACKNOWLEDGMENT We would like to thank Man-Hong Yung, Pengfei Zhou, Zegang Li and Yuxuan Li for advices and discussions.  ... 
arXiv:2205.00393v1 fatcat:vtraiuzdxreu7mhsdgqhr4vsai

Closing the "Quantum Supremacy" Gap: Achieving Real-Time Simulation of a Random Quantum Circuit Using a New Sunway Supercomputer [article]

Yong Li, Haohuan Fu, Yuling Yang, Jiawei Song, Pengpeng Zhao, Zhen Wang, Dajia Peng, Huarong Chen, Chu Guo, Heliang Huang, Wenzhao Wu, Dexun Chen
2021 arXiv   pre-print
to about 42 million cores; (3) a fused permutation and multiplication design that improves the compute efficiency for a wide range of tensor contraction scenarios; and (4) a mixed-precision scheme to  ...  We develop a high-performance tensor-based simulator for random quantum circuits(RQCs) on the new Sunway supercomputer.  ...  Acknowledgement We would like to thank Dapeng Yu, Heng Fan, Guoping Guo, Yongjian Han, and Xiaobo Zhu for advices and discussions.  ... 
arXiv:2110.14502v1 fatcat:otiuvk735ncarbngangfrq777i

Analysis and tuning of libtensor framework on multicore architectures

Khaled Z. Ibrahim, Samuel W. Williams, Evgeny Epifanovsky, Anna I. Krylov
2014 2014 21st International Conference on High Performance Computing (HiPC)  
It has been optimized for symmetry and sparsity to be memory efficient. This allows it to run efficiently on the ubiquitous and cost-effective SMP architectures.  ...  To that end, in this paper, we explore a number of optimization techniques including a thread-friendly and NUMA-aware memory allocator and garbage collector, tuning the tensor tiling factor, and tuning  ...  Data locality The performance of BLAS routines, typically used in tensor contractions, depends on the locality of the data.  ... 
doi:10.1109/hipc.2014.7116881 dblp:conf/hipc/IbrahimWEK14 fatcat:fi63t7ifjne57bzfqor2rhvkua

Empirical Performance-Model Driven Data Layout Optimization [chapter]

Qingda Lu, Xiaoyang Gao, Sriram Krishnamoorthy, Gerald Baumgartner, J. Ramanujam, P. Sadayappan
2005 Lecture Notes in Computer Science  
The performance model with empirically determined cost components is used to perform data layout optimization in the context of the Tensor Contraction Engine, a compiler for a high-level domainspecific  ...  language for expressing computational models in quantum chemistry.  ...  The approach was developed for a program synthesis system targeted at the quantum chemistry domain.  ... 
doi:10.1007/11532378_7 fatcat:rsnop6hjx5fltjrj2ivbfx2mdi

AutoHOOT: Automatic High-Order Optimization for Tensors [article]

Linjian Ma, Jiayu Ye, Edgar Solomonik
2020 arXiv   pre-print
These tensor methods are used for data analysis and simulation of quantum systems.  ...  High-order optimization methods, including Newton's method and its variants as well as alternating minimization methods, dominate the optimization algorithms for tensor decompositions and tensor networks  ...  Constrained Contraction Path Construction We provide a constrained contraction path selection routine, such that the contraction path is optimized under the constraint that partial inputs' contraction  ... 
arXiv:2005.04540v2 fatcat:4kfib322srf7xen555toxa7774

Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models

G. Baumgartner, A. Auer, D.E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, Xiaoyang Gao, R.J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, Chi-chung Lam (+6 others)
2005 Proceedings of the IEEE  
These computations are expressible as a set of tensor contractions and arise in electronic structure modeling.  ...  This paper provides an overview of a program synthesis system for a class of quantum chemistry computations.  ...  DATA LOCALITY OPTIMIZATION Once a solution is found that fits onto disk, we optimize the data locality to reduce memory and disk access times.  ... 
doi:10.1109/jproc.2004.840311 fatcat:gaxrixebifhstbb7ul2gsrpc4m
« Previous Showing results 1 — 15 out of 4,540 results