Filters








27,718 Hits in 6.6 sec

Loop Distribution and Fusion with Timing and Code Size Optimization for Embedded DSPs [chapter]

Meilin Liu, Qingfeng Zhuge, Zili Shao, Chun Xue, Meikang Qiu, Edwin H. -M. Sha
2005 Lecture Notes in Computer Science  
In this paper, we propose a new technique combining loop distribution with direct loop fusion, which will improve the timing performance without jeopardizing the code size.  ...  Then, we propose the technique of maximum loop distribution with direct loop fusion.  ...  We compare the code sizes and the execution time of the original loops with those of the fused loops produced by the technique of maximum loop distribution with direct fusion (MLD DF).  ... 
doi:10.1007/11596356_15 fatcat:xeazl5tg7rctdmymoyofrjfqpa

Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization [chapter]

Sadaf Alam, Jeffrey Vetter
2005 Lecture Notes in Computer Science  
POP has been extensively optimized for X1 by instrumenting the code using X1 compiler directives.  ...  We compare and contrast automatic and manual optimization schemes available on X1 and analyze their impact on the code performance and scalability.  ...  Acknowledgements This research was sponsored by the Office of Mathematical, Information, and Computational Sciences, Office of Science, U.S. Department of Energy under Contract No.  ... 
doi:10.1007/11428831_38 fatcat:q2ffopz6rndlpltpksmtv5cpba

The implementation and evaluation of fusion and contraction in array languages

E. Christopher Lewis, Calvin Lin, Lawrence Snyder
1998 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation - PLDI '98  
Most compilers address this problem by simply scalarizing the array language and relying on a scalar language compiler to perform loop fusion and array contraction.  ...  We instead show that there are advantages to performing a form of loop fusion and array contraction at the array level. This paper describes this approach and explains its advantages.  ...  This research was conducted using the resources of the Cornell Theory Center, which receives major funding from the National Science Foundation and New York State, with additional support from the National  ... 
doi:10.1145/277650.277663 dblp:conf/pldi/LewisLS98 fatcat:j5pz5cn75rectbklnqdgw5m2me

The implementation and evaluation of fusion and contraction in array languages

E. Christopher Lewis, Calvin Lin, Lawrence Snyder
1998 SIGPLAN notices  
Most compilers address this problem by simply scalarizing the array language and relying on a scalar language compiler to perform loop fusion and array contraction.  ...  We instead show that there are advantages to performing a form of loop fusion and array contraction at the array level. This paper describes this approach and explains its advantages.  ...  This research was conducted using the resources of the Cornell Theory Center, which receives major funding from the National Science Foundation and New York State, with additional support from the National  ... 
doi:10.1145/277652.277663 fatcat:mmz6zbfeprgobl46veaxgnncr4

Loop fusion for memory space optimization

Antoine Fraboulet, Karen Kodary, Anne Mignotte
2001 Proceedings of the 14th international symposium on Systems synthesis - ISSS '01  
This paper presents an optimal algorithm to reduce the use of temporary arrays by loop fusion. Although the algorithm is not polynomial, experiments have shown that it is very efficient.  ...  These transformations are performed on "for" loops that constitute the main parts which handle the arrays of the multimedia code.  ...  Modeling the Problem We approximate memory size requirement by the maximum size of time overlapping arrays.  ... 
doi:10.1145/500001.500025 fatcat:bmri6fzjkvegpkr7cgalgywvu4

Loop fusion for memory space optimization

Antoine Fraboulet, Karen Kodary, Anne Mignotte
2001 Proceedings of the 14th international symposium on Systems synthesis - ISSS '01  
This paper presents an optimal algorithm to reduce the use of temporary arrays by loop fusion. Although the algorithm is not polynomial, experiments have shown that it is very efficient.  ...  These transformations are performed on "for" loops that constitute the main parts which handle the arrays of the multimedia code.  ...  Modeling the Problem We approximate memory size requirement by the maximum size of time overlapping arrays.  ... 
doi:10.1145/500024.500025 fatcat:yatpxeiekbfr7m4xksqaggga3u

Schedule Synthesis for Halide Pipelines on GPUs

Savvas Sioutas, Sander Stuijk, Twan Basten, Henk Corporaal, Lou Somers
2020 ACM Transactions on Architecture and Code Optimization (TACO)  
The Halide DSL and compiler have enabled high-performance code generation for image processing pipelines targeting heterogeneous architectures through the separation of algorithmic description and optimization  ...  As a result, expert knowledge is still required when optimizing for platforms with GPU capabilities.  ...  Loop tiling is often used alongside kernel fusion to exploit parallelism and enable both spatial and temporal reuse across stages.  ... 
doi:10.1145/3406117 fatcat:wqtxe4g7hnc6lcwgjbuhfirnk4

A Case Study of Some Issues in the Optimization of Fortran 90 Array Notation

John D. McCalpin
1996 Scientific Programming  
Special attention is paid to the optimization of memory use and memory traffic.  ...  Some issues in the relationship of coding style and compiler optimization are discussed with regard to Fortran 90 array notation.  ...  ACKNOWLEDGMENTS I thank John Prentice and Reg Clemens of Quetzal Computational Associates, Preston Briggs of Tera Computer, and Jon Steidel and Bill Homer of Cray Research for helpful comments on earlier  ... 
doi:10.1155/1996/208679 fatcat:aetkysbwabalxhytuupnuzff4q

Optimizing the memory bandwidth with loop fusion

Paul Marchal, José Ignacio Gómez, Francky Catthoor
2004 Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis - CODES+ISSS '04  
At the same time, the assignment also strongly influences the energy cost. Therefore, we combine in our approach the fusion and assignment decisions.  ...  We propose a technique to optimize the memory bandwidth across the boundaries of a basic block. Our technique incrementally fuses loops to better use the available bandwidth.  ...  We apply our loop fusion algorithm for different memory sizes. We always use the fastest possible assignment.  ... 
doi:10.1145/1016720.1016767 dblp:conf/codes/MarchalGC04 fatcat:vz5wh3q5zfcghpnnzq627wvxua

Influence of Loop Optimizations on Energy Consumption of Multi-bank Memory Systems [chapter]

Mahmut Kandemir, Ibrahim Kolcu, Ismail Kadayif
2002 Lecture Notes in Computer Science  
To test our approach, we have implemented bank-conscious versions of three loop transformation techniques (loop fission/fusion, linear loop transformations, and loop tiling) using an experimental compiler  ...  infrastructure, and measured the energy benefits using nine array-dominated codes.  ...  After applying loop fission and fusion, within the outer for-loop (in Figure 4) , each of the nests is optimized using loop permutation and tiling for off-chip memory energy and data locality.  ... 
doi:10.1007/3-540-45937-5_20 fatcat:pjrhrtdr7zhfpcoq35djuasbfm

Shared buffer implementations of signal processing systems using lifetime analysis techniques

P.K. Murthy, S.S. Bhattacharyya
2001 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
In this paper, we build on our previously developed analysis and optimization framework for looped schedules to formally tackle the problem of generating optimally compact schedules for SDF graphs.  ...  The method we use is that of lifetime analysis; we develop a model for buffer lifetimes in SDF graphs and develop scheduling algorithms that attempt to generate schedules that minimize the maximum number  ...  Edwards and their anonymous referees, for their helpful remarks for improving the readability and presentation of the paper.  ... 
doi:10.1109/43.908427 fatcat:mxr3qdgrtbadtccfs3rdeb3isu

PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives [article]

Sanket Tavarageri, Alexander Heinecke, Sasikanth Avancha, Gagandeep Goyal, Ramakrishna Upadrasta, Bharat Kaul
2020 arXiv   pre-print
However, given the constant emergence of new DNN architectures, creating hand optimized code is expensive, slow and is not scalable.  ...  Once the model is created, its use in the intended application - the inference task, is computationally heavy too and the inference needs to be fast for real time use.  ...  PolyDL performs outer loop optimization around the call to the matrix multiplication microkernel by loop reordering and tiling using various tile sizes.  ... 
arXiv:2006.02230v2 fatcat:sjxmey3zzndupip75cf3uae4rq

Optimizing Inter-Nest Data Locality Using Loop Splitting and Reordering

Sofiane Naci
2007 2007 IEEE International Parallel and Distributed Processing Symposium  
In this paper, we present a compiler strategy that optimizes inter-nest data locality using code restructuring and loop transformations.  ...  The transformed program is then further optimized using loop transformations.  ...  Figure 4 . 4 Effects of splitting and fusion on code size Figure 5 . 5 Effects of cache size (a) on effective data access time (b) on average miss rate Figure 6 . 6 Effects of cache size on k14 and  ... 
doi:10.1109/ipdps.2007.370399 dblp:conf/ipps/Naci07 fatcat:5f7lijpyiredlcz5nfk2cdq5pu

Advanced Scalarization of Array Syntax [chapter]

Gerald Roth
2000 Lecture Notes in Computer Science  
These same compilers then make additional subsequent passes to perform loop optimizations such as loop fusion.  ...  Standard Scalarization and Optimization At some point during the compilation of a Fortran 90 program, array assignment statements must be translated into serial DO-loops. This process is known as  ...  Acknowledgments I'd like to thank Robert Corbett, Larry Meadows, and Prakash Narayan of Sun Microsystems for their support of this work.  ... 
doi:10.1007/3-540-46423-9_15 fatcat:bxz2f2jbkrapdpi2xj7txaffpa

Collective loop fusion for array contraction [chapter]

G. Gao, R. Olsen, V. Sarkar, R. Thekkath
1993 Lecture Notes in Computer Science  
In this paper we propose a method for applying the loop fusion and array contraction optimizations across a collection of loop nests.  ...  Our partitioning method is both novel and e cient; it is novel in that it uses the maxow min-cut algorithm to minimize the cost of array operations, and it is e cient in that it executes in polynomial-time  ...  Acknowledgments We gratefully acknowledge the support from IBM Corporation, the National Science and Engineering Research Council (NSERC), and Radhika Thekkath was supported in part by NSF PYI Award #MIP  ... 
doi:10.1007/3-540-57502-2_53 fatcat:2moutk2qifbmdbxclfhwb7jnnu
« Previous Showing results 1 — 15 out of 27,718 results