A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2006; you can also visit the original URL.
The file type is application/pdf
.
Filters
Loop optimization for a class of memory-constrained computations
2001
Proceedings of the 15th international conference on Supercomputing - ICS '01
Given an operation-count-optimal form of the computation (from the solution to the above sub-problem), perform appropriate loop transformations to optimize its execution, subject to memory capacity limitations ...
This paper develops an integrated model combining loop tiling for enhancing data reuse, and loop fusion for reduction of memory for intermediate temporary arrays. ...
Acknowledgments We would like to thank the National Center for Supercomputing Applications (NCSA) and the Ohio Supercomputer Center (OSC) for the use of their computing facilities. ...
doi:10.1145/377792.377814
dblp:conf/ics/CociorvaWLBRS01
fatcat:hyddxiiourahlozib63gk4kxly
Parallelization via constrained storage mapping optimization
[chapter]
1999
Lecture Notes in Computer Science
A framework for parallel execution order and storage mapping computation is designed, allowing time and space optimization. ...
Constrained expansion|a theoretical model for expansion strategies|is shown to be very useful in this context. ...
Acknowledgments: Thanks to Denis Barthou, Jean-Fran cois Collard, Vincent Lefebvre, Paul Feautrier, and Laurent Vibert, for their help and support. ...
doi:10.1007/bfb0094913
fatcat:nefazunuozfp7de2ekljr6olxu
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver
2006
Journal of Parallel and Distributed Computing
We address the problem of efficient out-of-core code generation for a special class of imperfectly nested loops encoding tensor contractions. ...
These loops operate on arrays too large to fit in physical memory. The problem involves determining optimal tiling and placement of disk I/O statements. ...
We are grateful to the Ohio Supercomputer Center (OSC) for the use of their computing facilities. ...
doi:10.1016/j.jpdc.2005.06.017
fatcat:vbkqcart6jeyrals3umw2uqohe
Memory-Constrained Communication Minimization for a Class of Array Computations
[chapter]
2005
Lecture Notes in Computer Science
In this paper, we address the memory-constrained communication minimization problem in the context of this class of computations. ...
The effectiveness of the developed optimization approach is demonstrated on a computation representative of a component used in quantum chemistry suites. ...
Acknowledgments We thanks the support of the National Science Foundation through the Information Technology Research program (CHE-0121676 and CHE-0121706), and NSF grants CCR-0073800 and EIA-9986052. ...
doi:10.1007/11596110_1
fatcat:quptxabwsbfcljrpwb2cpuxcn4
Optimization of Memory Usage Requirement for a Class of Loops Implementing Multi-dimensional Integrals
[chapter]
2000
Lecture Notes in Computer Science
Based on a framework that models the relationship between loop fusion and memory usage, we propose an algorithm for finding a loop fusion configuration that minimizes memory usage. ...
In the context of these integral calculations, this paper addresses a memory usage minimization problem. ...
Introduction This paper addresses the optimization of a class of loop computations that implement multi-dimensional integrals of the product of several arrays. ...
doi:10.1007/3-540-44905-1_22
fatcat:umdhvdt6e5b2vp7fc3hy33rdyq
Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms
[chapter]
2003
Lecture Notes in Computer Science
This paper describes an approach to synthesis of efficient out-of-core code for a class of imperfectly nested loops that represent tensor contraction computations. ...
The developed approach combines loop fusion with loop tiling and uses a performance-model driven approach to loop tiling for the generation of out-of-core code. ...
Department of Energy through award DE-AC05-00OR22725. We would also like to thank the Ohio Supercomputer Center (OSC) for the use of their computing facilities. ...
doi:10.1007/978-3-540-24596-4_44
fatcat:uf7sndsr6vbjnlq5hvdyeilkda
Memory-Constrained Data Locality Optimization for Tensor Contractions
[chapter]
2004
Lecture Notes in Computer Science
In this paper, we address the memory-constrained data-locality optimization problem in the context of this class of computations. ...
To optimize the performance of such computations a combination of loop fusion and loop tiling is required, so that the cost of disk I/O is minimized. ...
Acknowledgments We thank the National Science Foundation for its support of this research through the Information Technology Research program (CHE-0121676 and CHE-0121706), NSF grants CCR-0073800 and EIA ...
doi:10.1007/978-3-540-24644-2_7
fatcat:ixthi66h3rgr5onjrwdobvz2oe
Locality optimization in wireless applications
2007
Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis - CODES+ISSS '07
There is a strong need now for compilers of embedded systems to find effective ways of optimizing series of loop-nests, wherein majority of the memory references occur in the form of multi-dimensional ...
We propose a novel solution to multiple loop-nest optimization problem using the concept of constraints. ...
compute the reuse vectors rij for the loop-nest li compute the dependence vectors dij compute the dependence-ratio using dijs endfor partition L into four reuse-class: SV: loop-nests with just one best ...
doi:10.1145/1289816.1289850
dblp:conf/codes/AbsarLRLJVC07
fatcat:iy6wxdgfwvgvblnz564bqy5joi
Using the SSA-Form in a Code Generator
[chapter]
2014
Lecture Notes in Computer Science
In high-end compilers such as Open64, GCC or LLVM, the Static Single Assignment (SSA) form is a structural part of the targetindependent program representation that supports most of the code optimizations ...
We discuss some of the issues of inserting the SSA form in a code generator, specifically: what are the challenges of maintaining the SSA form on a program representation based on machine instructions; ...
Non-reducible control-flow allows for different loop nesting forests for a given CFG, yet highlevel information such as loop-carried memory dependences, or user-level loop annotations, are provided to ...
doi:10.1007/978-3-642-54807-9_1
fatcat:c3b2j6gf6fepfbclxn2xzsvcw4
Space-time trade-off optimization for a class of electronic structure calculations
2002
Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02
that fits within a specified memory limit. ...
There are usually a large number of alternative ways of implementing the tensor contractions, representing different trade-offs between the space required for temporary intermediates and the total number ...
programs for a class of computations encountered in quantum chemistry. ...
doi:10.1145/512549.512551
fatcat:y3zuziytzjallmtgysjt2472dq
Space-time trade-off optimization for a class of electronic structure calculations
2002
Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation - PLDI '02
that fits within a specified memory limit. ...
There are usually a large number of alternative ways of implementing the tensor contractions, representing different trade-offs between the space required for temporary intermediates and the total number ...
programs for a class of computations encountered in quantum chemistry. ...
doi:10.1145/512529.512551
dblp:conf/pldi/CociorvaBLSRNBH02
fatcat:hdy6zbuuhrggjf7kwlalazbcmi
Space-time trade-off optimization for a class of electronic structure calculations
2002
SIGPLAN notices
that fits within a specified memory limit. ...
There are usually a large number of alternative ways of implementing the tensor contractions, representing different trade-offs between the space required for temporary intermediates and the total number ...
programs for a class of computations encountered in quantum chemistry. ...
doi:10.1145/543552.512551
fatcat:dotf2cim75e3vftocdzhc3hvki
Asynchronous Stochastic Frank-Wolfe Algorithms for Non-Convex Optimization
2019
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence
The experimental results on real high-dimensional gray-scale images not only confirm the fast convergence of our algorithms, but also show a near-linear speedup on a parallel system with shared memory ...
To the best of our knowledge, AsySFW and AsySVFW are the first asynchronous parallel stochastic algorithms with convergence guarantees for solving the constrained non-convex optimization problems. ...
Most of existing Frank-Wolfe algorithms were designed for the constrained smooth convex optimization problems. ...
doi:10.24963/ijcai.2019/104
dblp:conf/ijcai/GuXH19
fatcat:y2fccqpyqzhnfc2hqqdelh3fte
Memory-constrained Block Processing Optimization for Synthesis of DSP Software
2006
2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation
Our experimental results indicate that these methods derive optimal memory-constrained block processing solutions most of the time. ...
It is important to take into account this form of processing when implementing embedded software for DSP systems. ...
Specifically, the APGAN and GDPPO algorithms are employed in this work to compute buffer-efficient SASs as a starting point for our memory-constrained block processing optimization [1] .
A. ...
doi:10.1109/icsamos.2006.300820
dblp:conf/samos/KoSB06
fatcat:cnp36gsaanf7fpgxvjudrjm4cu
Machine Learning for Microcontroller-Class Hardware – A Review
[article]
2022
arXiv
pre-print
We characterize a closed-loop widely applicable workflow of machine learning model development for microcontroller class devices and show that several classes of applications adopt a specific instance ...
Conventional machine learning deployment has high memory and compute footprint hindering their direct deployment on ultra resource-constrained microcontrollers. ...
• We illustrate a coherent and closed-loop ML model development and deployment workflow for microcontrollers. We delineate each block in the workflow, providing both ...
arXiv:2205.14550v3
fatcat:y272riitirhwfgfiotlwv5i7nu
« Previous
Showing results 1 — 15 out of 46,026 results