Filters








46,026 Hits in 4.0 sec

Loop optimization for a class of memory-constrained computations

D. Cociorva, J. W. Wilkins, C. Lam, G. Baumgartner, J. Ramanujam, P. Sadayappan
2001 Proceedings of the 15th international conference on Supercomputing - ICS '01  
Given an operation-count-optimal form of the computation (from the solution to the above sub-problem), perform appropriate loop transformations to optimize its execution, subject to memory capacity limitations  ...  This paper develops an integrated model combining loop tiling for enhancing data reuse, and loop fusion for reduction of memory for intermediate temporary arrays.  ...  Acknowledgments We would like to thank the National Center for Supercomputing Applications (NCSA) and the Ohio Supercomputer Center (OSC) for the use of their computing facilities.  ... 
doi:10.1145/377792.377814 dblp:conf/ics/CociorvaWLBRS01 fatcat:hyddxiiourahlozib63gk4kxly

Parallelization via constrained storage mapping optimization [chapter]

Albert Cohen
1999 Lecture Notes in Computer Science  
A framework for parallel execution order and storage mapping computation is designed, allowing time and space optimization.  ...  Constrained expansion|a theoretical model for expansion strategies|is shown to be very useful in this context.  ...  Acknowledgments: Thanks to Denis Barthou, Jean-Fran cois Collard, Vincent Lefebvre, Paul Feautrier, and Laurent Vibert, for their help and support.  ... 
doi:10.1007/bfb0094913 fatcat:nefazunuozfp7de2ekljr6olxu

Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver

Sandhya Krishnan, Sriram Krishnamoorthy, Gerald Baumgartner, Chi-Chung Lam, J. Ramanujam, P. Sadayappan, Venkatesh Choppella
2006 Journal of Parallel and Distributed Computing  
We address the problem of efficient out-of-core code generation for a special class of imperfectly nested loops encoding tensor contractions.  ...  These loops operate on arrays too large to fit in physical memory. The problem involves determining optimal tiling and placement of disk I/O statements.  ...  We are grateful to the Ohio Supercomputer Center (OSC) for the use of their computing facilities.  ... 
doi:10.1016/j.jpdc.2005.06.017 fatcat:vbkqcart6jeyrals3umw2uqohe

Memory-Constrained Communication Minimization for a Class of Array Computations [chapter]

Daniel Cociorva, Gerald Baumgartner, Chi-Chung Lam, P. Sadayappan, J. Ramanujam
2005 Lecture Notes in Computer Science  
In this paper, we address the memory-constrained communication minimization problem in the context of this class of computations.  ...  The effectiveness of the developed optimization approach is demonstrated on a computation representative of a component used in quantum chemistry suites.  ...  Acknowledgments We thanks the support of the National Science Foundation through the Information Technology Research program (CHE-0121676 and CHE-0121706), and NSF grants CCR-0073800 and EIA-9986052.  ... 
doi:10.1007/11596110_1 fatcat:quptxabwsbfcljrpwb2cpuxcn4

Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals [chapter]

Chi-Chung Lam, Daniel Cociorva, Gerald Baumgartner, P. Sadayappan
2000 Lecture Notes in Computer Science  
Based on a framework that models the relationship between loop fusion and memory usage, we propose an algorithm for finding a loop fusion configuration that minimizes memory usage.  ...  In the context of these integral calculations, this paper addresses a memory usage minimization problem.  ...  Introduction This paper addresses the optimization of a class of loop computations that implement multi-dimensional integrals of the product of several arrays.  ... 
doi:10.1007/3-540-44905-1_22 fatcat:umdhvdt6e5b2vp7fc3hy33rdyq

Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms [chapter]

Sandhya Krishnan, Sriram Krishnamoorthy, Gerald Baumgartner, Daniel Cociorva, Chi-Chung Lam, P. Sadayappan, J. Ramanujam, David E. Bernholdt, Venkatesh Choppella
2003 Lecture Notes in Computer Science  
This paper describes an approach to synthesis of efficient out-of-core code for a class of imperfectly nested loops that represent tensor contraction computations.  ...  The developed approach combines loop fusion with loop tiling and uses a performance-model driven approach to loop tiling for the generation of out-of-core code.  ...  Department of Energy through award DE-AC05-00OR22725. We would also like to thank the Ohio Supercomputer Center (OSC) for the use of their computing facilities.  ... 
doi:10.1007/978-3-540-24596-4_44 fatcat:uf7sndsr6vbjnlq5hvdyeilkda

Memory-Constrained Data Locality Optimization for Tensor Contractions [chapter]

Alina Bibireata, Sandhya Krishnan, Gerald Baumgartner, Daniel Cociorva, Chi-Chung Lam, P. Sadayappan, J. Ramanujam, David E. Bernholdt, Venkatesh Choppella
2004 Lecture Notes in Computer Science  
In this paper, we address the memory-constrained data-locality optimization problem in the context of this class of computations.  ...  To optimize the performance of such computations a combination of loop fusion and loop tiling is required, so that the cost of disk I/O is minimized.  ...  Acknowledgments We thank the National Science Foundation for its support of this research through the Information Technology Research program (CHE-0121676 and CHE-0121706), NSF grants CCR-0073800 and EIA  ... 
doi:10.1007/978-3-540-24644-2_7 fatcat:ixthi66h3rgr5onjrwdobvz2oe

Locality optimization in wireless applications

Javed Absar, Min Li, Praveen Raghavan, Andy Lambrechts, Murali Jayapala, Arnout Vandecappelle, Francky Catthoor
2007 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis - CODES+ISSS '07  
There is a strong need now for compilers of embedded systems to find effective ways of optimizing series of loop-nests, wherein majority of the memory references occur in the form of multi-dimensional  ...  We propose a novel solution to multiple loop-nest optimization problem using the concept of constraints.  ...  compute the reuse vectors rij for the loop-nest li compute the dependence vectors dij compute the dependence-ratio using dijs endfor partition L into four reuse-class: SV: loop-nests with just one best  ... 
doi:10.1145/1289816.1289850 dblp:conf/codes/AbsarLRLJVC07 fatcat:iy6wxdgfwvgvblnz564bqy5joi

Using the SSA-Form in a Code Generator [chapter]

Benoît Dupont de Dinechin
2014 Lecture Notes in Computer Science  
In high-end compilers such as Open64, GCC or LLVM, the Static Single Assignment (SSA) form is a structural part of the targetindependent program representation that supports most of the code optimizations  ...  We discuss some of the issues of inserting the SSA form in a code generator, specifically: what are the challenges of maintaining the SSA form on a program representation based on machine instructions;  ...  Non-reducible control-flow allows for different loop nesting forests for a given CFG, yet highlevel information such as loop-carried memory dependences, or user-level loop annotations, are provided to  ... 
doi:10.1007/978-3-642-54807-9_1 fatcat:c3b2j6gf6fepfbclxn2xzsvcw4

Space-time trade-off optimization for a class of electronic structure calculations

Daniel Cociorva, Gerald Baumgartner, Chi-Chung Lam, P. Sadayappan, J. Ramanujam, Marcel Nooijen, David E. Bernholdt, Robert Harrison
2002 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02  
that fits within a specified memory limit.  ...  There are usually a large number of alternative ways of implementing the tensor contractions, representing different trade-offs between the space required for temporary intermediates and the total number  ...  programs for a class of computations encountered in quantum chemistry.  ... 
doi:10.1145/512549.512551 fatcat:y3zuziytzjallmtgysjt2472dq

Space-time trade-off optimization for a class of electronic structure calculations

Daniel Cociorva, Gerald Baumgartner, Chi-Chung Lam, P. Sadayappan, J. Ramanujam, Marcel Nooijen, David E. Bernholdt, Robert Harrison
2002 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02  
that fits within a specified memory limit.  ...  There are usually a large number of alternative ways of implementing the tensor contractions, representing different trade-offs between the space required for temporary intermediates and the total number  ...  programs for a class of computations encountered in quantum chemistry.  ... 
doi:10.1145/512529.512551 dblp:conf/pldi/CociorvaBLSRNBH02 fatcat:hdy6zbuuhrggjf7kwlalazbcmi

Space-time trade-off optimization for a class of electronic structure calculations

Daniel Cociorva, Gerald Baumgartner, Chi-Chung Lam, P. Sadayappan, J. Ramanujam, Marcel Nooijen, David E. Bernholdt, Robert Harrison
2002 SIGPLAN notices  
that fits within a specified memory limit.  ...  There are usually a large number of alternative ways of implementing the tensor contractions, representing different trade-offs between the space required for temporary intermediates and the total number  ...  programs for a class of computations encountered in quantum chemistry.  ... 
doi:10.1145/543552.512551 fatcat:dotf2cim75e3vftocdzhc3hvki

Asynchronous Stochastic Frank-Wolfe Algorithms for Non-Convex Optimization

Bin Gu, Wenhan Xian, Heng Huang
2019 Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence  
The experimental results on real high-dimensional gray-scale images not only confirm the fast convergence of our algorithms, but also show a near-linear speedup on a parallel system with shared memory  ...  To the best of our knowledge, AsySFW and AsySVFW are the first asynchronous parallel stochastic algorithms with convergence guarantees for solving the constrained non-convex optimization problems.  ...  Most of existing Frank-Wolfe algorithms were designed for the constrained smooth convex optimization problems.  ... 
doi:10.24963/ijcai.2019/104 dblp:conf/ijcai/GuXH19 fatcat:y2fccqpyqzhnfc2hqqdelh3fte

Memory-constrained Block Processing Optimization for Synthesis of DSP Software

Ming-yung Ko, Chung-ching Shen, Shuvra Bhattacharyya
2006 2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation  
Our experimental results indicate that these methods derive optimal memory-constrained block processing solutions most of the time.  ...  It is important to take into account this form of processing when implementing embedded software for DSP systems.  ...  Specifically, the APGAN and GDPPO algorithms are employed in this work to compute buffer-efficient SASs as a starting point for our memory-constrained block processing optimization [1] . A.  ... 
doi:10.1109/icsamos.2006.300820 dblp:conf/samos/KoSB06 fatcat:cnp36gsaanf7fpgxvjudrjm4cu

Machine Learning for Microcontroller-Class Hardware – A Review [article]

Swapnil Sayan Saha, Sandeep Singh Sandha, Mani Srivastava
2022 arXiv   pre-print
We characterize a closed-loop widely applicable workflow of machine learning model development for microcontroller class devices and show that several classes of applications adopt a specific instance  ...  Conventional machine learning deployment has high memory and compute footprint hindering their direct deployment on ultra resource-constrained microcontrollers.  ...  • We illustrate a coherent and closed-loop ML model development and deployment workflow for microcontrollers. We delineate each block in the workflow, providing both  ... 
arXiv:2205.14550v3 fatcat:y272riitirhwfgfiotlwv5i7nu
« Previous Showing results 1 — 15 out of 46,026 results