83 Hits in 4.6 sec

Identifying Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations [chapter]

Albert Hartono, Qingda Lu, Xiaoyang Gao, Sriram Krishnamoorthy, Marcel Nooijen, Gerald Baumgartner, David E. Bernholdt, Venkatesh Choppella, Russell M. Pitzer, J. Ramanujam, Atanas Rountev, P. Sadayappan
2006 Lecture Notes in Computer Science  
The identification of common subexpressions among a set of tensor contraction expressions can result in a reduction of the total number of operations required to evaluate the tensor contractions.  ...  In this paper, we develop an effective algorithm for common subexpression identification and demonstrate its effectiveness on tensor contraction expressions for coupled cluster equations.  ...  In this new formulation, the it arrays are the common subexpressions identified to reduce the operation count.  ... 
doi:10.1007/11758501_39 fatcat:2r6nch5y3zgehhvyjfu57t34vm

Effective Utilization of Tensor Symmetry in Operation Optimization of Tensor Contraction Expressions

Pai-Wei Lai, Huaijian Zhang, Samyam Rajbhandari, Edward Valeev, Karol Kowalski, P. Sadayappan
2012 Procedia Computer Science  
In this paper, we address the effective exploitation of symmetry properties of tensors in performing algebraic transformations for minimizing operation count of tensor expressions.  ...  We demonstrate significant improvements to the operation counts for the coupled cluster method when compared to several state-of-the-art implementations.  ...  We are also grateful to Dr. Marcel Nooijen for his guidance and support.  ... 
doi:10.1016/j.procs.2012.04.044 fatcat:n7znbmaizbac3k5do3abzq6ckm

Optimized code generation for finite element local assembly using symbolic manipulation

Francis P. Russell, Paul H. J. Kelly
2013 ACM Transactions on Mathematical Software  
We systematically evaluate the approach, measuring operation count, execution time and numerical error using a benchmark suite of synthetic variational forms, comparing against the FEniCS Form Compiler  ...  However, even for a theoretical performance indicator such as operation count, an optimal strategy for local assembly is unknown.  ...  In particular, the Intel C++ Compiler 12.0 appears to be significantly more effective at reducing the operation count of optimized FFC-generated quadrature and tensor contraction implementations in comparison  ... 
doi:10.1145/2491491.2491496 fatcat:ibkmnmw6tvf5liiurvcoajitiq

Two- and three-pion finite-volume spectra at maximal isospin from lattice QCD [article]

Ben Hörz, Andrew Hanlon
2019 arXiv   pre-print
and nonzero total momentum, in addition to the ground states in these channels.  ...  The required correlation functions, from which the spectrum is extracted, are computed using a newly implemented algorithm which reduces the number of operations, and hence speeds up the computation by  ...  The method we use to reduce the operation count required for the evaluation of tensor contractions was proposed in the context of quantum chemistry [52] [53] [54] and consists of two parts.  ... 
arXiv:1905.04277v1 fatcat:olp42fsxvrhelmzhn4rvjki5su

TSFC: A Structure-Preserving Form Compiler

Miklós Homolya, Lawrence Mitchell, Fabio Luporini, David A. Ham
2018 SIAM Journal on Scientific Computing  
This is also achieved in part by a two-stage approach that cleanly separates the lowering of finite element constructs to tensor algebra in the first stage, from the scheduling of those tensor operations  ...  TSFC features a novel, structure-preserving method for separating the contributions of a form to the subblocks of the local tensor in discontinuous Galerkin problems.  ...  TSFC relies on several recent enhancements to UFL by Martin S. Alnaes and Andrew T. T. McRae.  ... 
doi:10.1137/17m1130642 fatcat:b37smdhsirhtbdjtahzxcgagoi

A Fully Traceless Cartesian Multipole Formulation for the Distributed Fast Multipole Method [article]

Jonathan P. Coles, Rebekka Bieri
2018 arXiv   pre-print
In realistic tests of biophysical simulations we observe a 20% speed-up, demonstrating the efficiency and improved performance of these routines compared to non-traceless tensor operators.  ...  Using the traceless tensor form significantly reduces memory usage and network communication traffic for large-scale applications in molecular dynamics and astrophysics.  ...  The optimizing phase reduces the number of mathematical operations by extracting common subexpressions within an operator to intermediate variables and factoring out common constants.  ... 
arXiv:1811.06332v1 fatcat:2wzx7zohw5hmje7h6lr5qaayxu

Unified Form Language: A domain-specific language for weak formulations of partial differential equations [article]

Martin S. Alnaes and Anders Logg and Kristian B. Oelgaard and Marie E. Rognes and Garth N. Wells
2013 arXiv   pre-print
and flexible tensor algebra.  ...  With these features, UFL has been used to effortlessly express finite element methods for complex systems of partial differential equations in near-mathematical notation, resulting in compact, intuitive  ...  Acknowledgments The authors wish to thank Kent-Andre Mardal and Johannes Ring for their contributions to UFL, and Pearu Peterson for discussions about symbolic representations during the initial design  ... 
arXiv:1211.4047v2 fatcat:tqholucugjcx7mdl4dmup4lpz4

AutoHOOT: Automatic High-Order Optimization for Tensors [article]

Linjian Ma, Jiayu Ye, Edgar Solomonik
2020 arXiv   pre-print
In particular, AutoHOOT contains a new explicit Jacobian / Hessian expression generation kernel whose outputs maintain the input tensors' granularity and are easy to optimize.  ...  Experimental results show that AutoHOOT achieves competitive CPU and GPU performance for both tensor decomposition and tensor network applications compared to existing AD software and other tensor computation  ...  We assume the contraction time for each operation is proportional to the flop counts.  ... 
arXiv:2005.04540v2 fatcat:4kfib322srf7xen555toxa7774

An Algorithm for the Optimization of Finite Element Integration Loops

Fabio Luporini, David A. Ham, Paul H. J. Kelly
2017 ACM Transactions on Mathematical Software  
This algorithm, which exploits fundamental mathematical properties of finite element operators, is proven to achieve a locally optimal operation count.  ...  This validates the effectiveness of the algorithm presented here, and illustrates its limitations.  ...  Pre-evaluation can be seen as the generalization of tensor contraction (Section 2.3) to a wider class of sub-expressions.  ... 
doi:10.1145/3054944 fatcat:kitpmlah4nbpxc2dxqt6x6y6jq

Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions

Qingda Lu, Xiaoyang Gao, Sriram Krishnamoorthy, Gerald Baumgartner, J. Ramanujam, P. Sadayappan
2012 Journal of Parallel and Distributed Computing  
The TCE incorporates several compile-time optimizations, including algebraic transformations [45, 46] and common subexpression elimination [26] for minimizing operation counts, finding the optimal evaluation  ...  In contrast to our other papers on the TCE [6, 26, 16, 15] , in this paper, we address the problem of effective code generation for tensor contractions (products of multi-dimensional arrays) in terms of  ...  The summation (contraction) and non-summation indices in each contraction are identified as explained in Section 2.  ... 
doi:10.1016/j.jpdc.2011.09.006 fatcat:xxgixibzbzhr3o6sw34sympcpa

Parallel Solvers for Flexible Approximation Schemes in Multiparticle Simulation [chapter]

Masha Sosonkina, Igor Tsukerman
2006 Lecture Notes in Computer Science  
Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations Albert Hartono, Qingda Lu, Xiaoyang Gao, Sriram Krishnamoorthy, Marcel Nooijen, Gerald Baumgartner,  ...  Larriba-Pey 156 Multiscale Characteristics of Human Sleep EEG Time Series In-Ho Song, In-Young Kim, Doo-Soo Lee, Sun I.  ... 
doi:10.1007/11758501_12 fatcat:45ckgx3ijjafhjngtson4i5eli

Tensor Networks for Probabilistic Sequence Modeling [article]

Jacob Miller, Guillaume Rabusseau, John Terilla
2021 arXiv   pre-print
We first show that u-MPS enable sequence-level parallelism, with length-n sequences able to be evaluated in depth O(log n).  ...  Experiments on sequence modeling with synthetic and real text data show u-MPS outperforming a variety of baselines and effectively generalizing their predictions in the presence of limited data.  ...  As is common with recursive algorithms, caching intermediate information permits the naive cost of (n − 1) + (n − 2) + · · · + 1 = O(n 2 ) transfer operator applications to be reduced to O(n).  ... 
arXiv:2003.01039v4 fatcat:bzqfowuqaneozdewx27dpnmlfu

Synthesis of High-Performance Parallel Programs for a Class of ab Initio Quantum Chemistry Models

G. Baumgartner, A. Auer, D.E. Bernholdt, A. Bibireata, V. Choppella, D. Cociorva, Xiaoyang Gao, R.J. Harrison, S. Hirata, S. Krishnamoorthy, S. Krishnan, Chi-chung Lam (+6 others)
2005 Proceedings of the IEEE  
These computations are expressible as a set of tensor contractions and arise in electronic structure modeling.  ...  The input to the system is a a high-level specification of the computation, from which the system can synthesize high-performance parallel code tailored to the characteristics of the target architecture  ...  Reduction of arithmetic operations has been traditionally done by compilers using the technique of common subexpression elimination.  ... 
doi:10.1109/jproc.2004.840311 fatcat:gaxrixebifhstbb7ul2gsrpc4m

Geometric Optimization of the Evaluation of Finite Element Matrices

Robert C. Kirby, L. Ridgway Scott
2007 SIAM Journal on Scientific Computing  
Beginning in [10], we suggested that the local evaluation of the stiffness matrix for multilinear forms for a single affine element could be written as contractions of a set of reference tensors, with  ...  We go beyond the complexity-reducing binary relations explored in [R. C. Kirby, A. Logg, L. R. Scott, and A. R. Terrel, SIAM J. Sci.  ...  The authors search for ways of reducing operation counts by heuristics for common subexpression elimination.  ... 
doi:10.1137/060660722 fatcat:ted3nonl5jdzff75ntfe5hx6gm

A Comparison of Big Data Frameworks on a Layered Dataflow Model

Claudia Misale, Maurizio Drocco, Marco Aldinucci, Guy Tremblay
2017 Parallel Processing Letters  
In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters.  ...  Although each tool claims to provide better programming, data and execution models-for which only informal (and often confusing) semantics is generally provided-all share a common underlying model, namely  ...  Acknowledgements This work was partly supported by the EU-funded project TOREADOR (contract no. H2020-688797), the EU-funded project Rephrase (contract no. H2020-644235), and the 2015-2016 IBM Ph.D.  ... 
doi:10.1142/s0129626417400035 fatcat:bwsjg4qs7rf6jpkvqd5mnablqm
« Previous Showing results 1 — 15 out of 83 results