A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2015; you can also visit the original URL.
The file type is application/pdf
.
Filters
Loop Distribution and Fusion with Timing and Code Size Optimization for Embedded DSPs
[chapter]
2005
Lecture Notes in Computer Science
In this paper, we propose a new technique combining loop distribution with direct loop fusion, which will improve the timing performance without jeopardizing the code size. ...
Then, we propose the technique of maximum loop distribution with direct loop fusion. ...
We compare the code sizes and the execution time of the original loops with those of the fused loops produced by the technique of maximum loop distribution with direct fusion (MLD DF). ...
doi:10.1007/11596356_15
fatcat:xeazl5tg7rctdmymoyofrjfqpa
Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization
[chapter]
2005
Lecture Notes in Computer Science
POP has been extensively optimized for X1 by instrumenting the code using X1 compiler directives. ...
We compare and contrast automatic and manual optimization schemes available on X1 and analyze their impact on the code performance and scalability. ...
Acknowledgements This research was sponsored by the Office of Mathematical, Information, and Computational Sciences, Office of Science, U.S. Department of Energy under Contract No. ...
doi:10.1007/11428831_38
fatcat:q2ffopz6rndlpltpksmtv5cpba
The implementation and evaluation of fusion and contraction in array languages
1998
Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation - PLDI '98
Most compilers address this problem by simply scalarizing the array language and relying on a scalar language compiler to perform loop fusion and array contraction. ...
We instead show that there are advantages to performing a form of loop fusion and array contraction at the array level. This paper describes this approach and explains its advantages. ...
This research was conducted using the resources of the Cornell Theory Center, which receives major funding from the National Science Foundation and New York State, with additional support from the National ...
doi:10.1145/277650.277663
dblp:conf/pldi/LewisLS98
fatcat:j5pz5cn75rectbklnqdgw5m2me
The implementation and evaluation of fusion and contraction in array languages
1998
SIGPLAN notices
Most compilers address this problem by simply scalarizing the array language and relying on a scalar language compiler to perform loop fusion and array contraction. ...
We instead show that there are advantages to performing a form of loop fusion and array contraction at the array level. This paper describes this approach and explains its advantages. ...
This research was conducted using the resources of the Cornell Theory Center, which receives major funding from the National Science Foundation and New York State, with additional support from the National ...
doi:10.1145/277652.277663
fatcat:mmz6zbfeprgobl46veaxgnncr4
Loop fusion for memory space optimization
2001
Proceedings of the 14th international symposium on Systems synthesis - ISSS '01
This paper presents an optimal algorithm to reduce the use of temporary arrays by loop fusion. Although the algorithm is not polynomial, experiments have shown that it is very efficient. ...
These transformations are performed on "for" loops that constitute the main parts which handle the arrays of the multimedia code. ...
Modeling the Problem We approximate memory size requirement by the maximum size of time overlapping arrays. ...
doi:10.1145/500001.500025
fatcat:bmri6fzjkvegpkr7cgalgywvu4
Loop fusion for memory space optimization
2001
Proceedings of the 14th international symposium on Systems synthesis - ISSS '01
This paper presents an optimal algorithm to reduce the use of temporary arrays by loop fusion. Although the algorithm is not polynomial, experiments have shown that it is very efficient. ...
These transformations are performed on "for" loops that constitute the main parts which handle the arrays of the multimedia code. ...
Modeling the Problem We approximate memory size requirement by the maximum size of time overlapping arrays. ...
doi:10.1145/500024.500025
fatcat:yatpxeiekbfr7m4xksqaggga3u
Schedule Synthesis for Halide Pipelines on GPUs
2020
ACM Transactions on Architecture and Code Optimization (TACO)
The Halide DSL and compiler have enabled high-performance code generation for image processing pipelines targeting heterogeneous architectures through the separation of algorithmic description and optimization ...
As a result, expert knowledge is still required when optimizing for platforms with GPU capabilities. ...
Loop tiling is often used alongside kernel fusion to exploit parallelism and enable both spatial and temporal reuse across stages. ...
doi:10.1145/3406117
fatcat:wqtxe4g7hnc6lcwgjbuhfirnk4
A Case Study of Some Issues in the Optimization of Fortran 90 Array Notation
1996
Scientific Programming
Special attention is paid to the optimization of memory use and memory traffic. ...
Some issues in the relationship of coding style and compiler optimization are discussed with regard to Fortran 90 array notation. ...
ACKNOWLEDGMENTS I thank John Prentice and Reg Clemens of Quetzal Computational Associates, Preston Briggs of Tera Computer, and Jon Steidel and Bill Homer of Cray Research for helpful comments on earlier ...
doi:10.1155/1996/208679
fatcat:aetkysbwabalxhytuupnuzff4q
Optimizing the memory bandwidth with loop fusion
2004
Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '04
At the same time, the assignment also strongly influences the energy cost. Therefore, we combine in our approach the fusion and assignment decisions. ...
We propose a technique to optimize the memory bandwidth across the boundaries of a basic block. Our technique incrementally fuses loops to better use the available bandwidth. ...
We apply our loop fusion algorithm for different memory sizes. We always use the fastest possible assignment. ...
doi:10.1145/1016720.1016767
dblp:conf/codes/MarchalGC04
fatcat:vz5wh3q5zfcghpnnzq627wvxua
Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems
[chapter]
2002
Lecture Notes in Computer Science
To test our approach, we have implemented bank-conscious versions of three loop transformation techniques (loop fission/fusion, linear loop transformations, and loop tiling) using an experimental compiler ...
infrastructure, and measured the energy benefits using nine array-dominated codes. ...
After applying loop fission and fusion, within the outer for-loop (in Figure 4) , each of the nests is optimized using loop permutation and tiling for off-chip memory energy and data locality. ...
doi:10.1007/3-540-45937-5_20
fatcat:pjrhrtdr7zhfpcoq35djuasbfm
Shared buffer implementations of signal processing systems using lifetime analysis techniques
2001
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
In this paper, we build on our previously developed analysis and optimization framework for looped schedules to formally tackle the problem of generating optimally compact schedules for SDF graphs. ...
The method we use is that of lifetime analysis; we develop a model for buffer lifetimes in SDF graphs and develop scheduling algorithms that attempt to generate schedules that minimize the maximum number ...
Edwards and their anonymous referees, for their helpful remarks for improving the readability and presentation of the paper. ...
doi:10.1109/43.908427
fatcat:mxr3qdgrtbadtccfs3rdeb3isu
PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives
[article]
2020
arXiv
pre-print
However, given the constant emergence of new DNN architectures, creating hand optimized code is expensive, slow and is not scalable. ...
Once the model is created, its use in the intended application - the inference task, is computationally heavy too and the inference needs to be fast for real time use. ...
PolyDL performs outer loop optimization around the call to the matrix multiplication microkernel by loop reordering and tiling using various tile sizes. ...
arXiv:2006.02230v2
fatcat:sjxmey3zzndupip75cf3uae4rq
Optimizing Inter-Nest Data Locality Using Loop Splitting and Reordering
2007
2007 IEEE International Parallel and Distributed Processing Symposium
In this paper, we present a compiler strategy that optimizes inter-nest data locality using code restructuring and loop transformations. ...
The transformed program is then further optimized using loop transformations. ...
Figure 4 . 4 Effects of splitting and fusion on code size
Figure 5 . 5 Effects of cache size (a) on effective data access time (b) on average miss rate
Figure 6 . 6 Effects of cache size on k14 and ...
doi:10.1109/ipdps.2007.370399
dblp:conf/ipps/Naci07
fatcat:5f7lijpyiredlcz5nfk2cdq5pu
Advanced Scalarization of Array Syntax
[chapter]
2000
Lecture Notes in Computer Science
These same compilers then make additional subsequent passes to perform loop optimizations such as loop fusion. ...
Standard Scalarization and Optimization At some point during the compilation of a Fortran 90 program, array assignment statements must be translated into serial DO-loops. This process is known as ...
Acknowledgments I'd like to thank Robert Corbett, Larry Meadows, and Prakash Narayan of Sun Microsystems for their support of this work. ...
doi:10.1007/3-540-46423-9_15
fatcat:bxz2f2jbkrapdpi2xj7txaffpa
Collective loop fusion for array contraction
[chapter]
1993
Lecture Notes in Computer Science
In this paper we propose a method for applying the loop fusion and array contraction optimizations across a collection of loop nests. ...
Our partitioning method is both novel and e cient; it is novel in that it uses the maxow min-cut algorithm to minimize the cost of array operations, and it is e cient in that it executes in polynomial-time ...
Acknowledgments We gratefully acknowledge the support from IBM Corporation, the National Science and Engineering Research Council (NSERC), and Radhika Thekkath was supported in part by NSF PYI Award #MIP ...
doi:10.1007/3-540-57502-2_53
fatcat:2moutk2qifbmdbxclfhwb7jnnu
« Previous
Showing results 1 — 15 out of 27,718 results