Filters








22,570 Hits in 5.1 sec

A general algorithm for tiling the register level

M. Jiménez, J. M. Llabería, A. Fernández, E. Morancho
1998 Proceedings of the 12th international conference on Supercomputing - ICS '98  
Previous work on tiling and also commercial compilers are able to perform tiling for the register level in more than one dimension when the iteration space is rectangular.  ...  In this paper we present a new general algorithm to perform tiling for the register level in more than one dimension in both rectangular and nonrectangular iteration spaces.  ...  TILING FOR THE REGISTER LEVEL In this section we will describe the transformation steps carried out by our method to perform tiling for the register level.  ... 
doi:10.1145/277830.277859 dblp:conf/ics/JimenezLFM98 fatcat:y6zj5ktajnbjfh6izyoilgxgyy

A Systematic Approach to Model-Guided Empirical Search for Memory Hierarchy Optimization [chapter]

Chun Chen, Jacqueline Chame, Mary Hall, Kristina Lerman
2006 Lecture Notes in Computer Science  
The goal of this work is a systematic approach to compiler optimization for simultaneously optimizing across multiple levels of the memory hierarchy.  ...  for the search.  ...  Conclusion This paper shows how the problem of optimizing for multiple levels of the memory hierarchy can be recast as a multi-variable optimization problem.  ... 
doi:10.1007/978-3-540-69330-7_30 fatcat:6akstbukbvbf3n6jgbrcp52tqy

Compact multi-dimensional kernel extraction for register tiling

Lakshminarayanan Renganarayana, Uday Bondhugula, Salem Derisavi, Alexandre E. Eichenberger, Kevin O'Brien
2009 Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis - SC '09  
We show that by using COMDEX as a pre-processing to register tiling we can (i) enable register tiling on complex loop structures and (ii) realize a significant performance improvement on a variety of codes  ...  They often fail to operate on complex loop structures leaving a significant amount of performance on the table.  ...  Evaluation We evaluate the performance gains due to COMDEX based register tiling by comparing it with the three different schemes, viz., XLSMP, AutoPoly, and PrimeTile.  ... 
doi:10.1145/1654059.1654105 dblp:conf/sc/RenganarayanaBDEO09 fatcat:wcvfqonpr5h6rcyvuixudh26uu

A methodology pruning the search space of six compiler transformations by addressing them together as one problem and by exploiting the hardware architecture details

Vasilios Kelefouras
2017 Computing  
The transformations are the following: loop tiling (including the number of the levels of tiling), loop unroll, register allocation, scalar replacement, loop interchange and data array layouts.  ...  The proposed methodology has been evaluated over iterative compilation and gcc/icc compilers, on both embedded and general purpose processors; it achieves significant performance gains at many orders of  ...  The initial search space is shown in Fig. 1 ; for a two level cache architecture it includes one level of tiling (tiling for the L1 or L2 cache), 2 levels of tiling (tiling for both L1 and L2 cache) and  ... 
doi:10.1007/s00607-016-0535-4 fatcat:kgw2ys3qfzbknbsr564pvdulsu

Locality Optimization for Data Parallel Programs [article]

Eric Hielscher, Alex Rubinsteyn, Dennis Shasha
2013 arXiv   pre-print
Applying this transformation once tiles the program for cache, and applying it again enables tiling for registers.  ...  The sizes for cache tiles are left unspecified until runtime, when an autotuning search is performed.  ...  We present a high-level syntactic transformation which enables tiling for better use of cache and registers.  ... 
arXiv:1304.1835v1 fatcat:npsyicogqfhozjsptavcc3wzta

MODESTO

Tobias Gysi, Tobias Grosser, Torsten Hoefler
2015 Proceedings of the 29th ACM on International Conference on Supercomputing - ICS '15  
Code transformations, such as loop tiling and loop fusion, are of key importance for the efficient implementation of stencil computations.  ...  that yield optimal performance.  ...  We thank Oliver Fuhrer (MeteoSwiss) and Carlos Osuna Escamilla (ETH) for helpful discussions, Armin Größlinger (University of Passau) for providing isl bindings for Java, as well as the Swiss National  ... 
doi:10.1145/2751205.2751223 dblp:conf/ics/GysiGH15 fatcat:5k2xhp4vejdy7jt7out7tzqm2i

Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture [chapter]

Elkin Garcia, Jaime Arteaga, Robert Pavel, Guang R. Gao
2014 Lecture Notes in Computer Science  
The main contributions of this paper are: (1) The modeling and analysis of energy consumption and energy efficiency for LU factorization; (2) the study and design of instruction-level and task-level optimizations  ...  for the reduction of the Static and Dynamic Energy; (3) the design and implementation of an energy aware tiling that decreases the Dynamic Energy of power hungry instructions in the LU factorization benchmark  ...  We also thank ET International, Inc. for its support during the course of experiments. Finally, we thank the reviewers for their valuable suggestions.  ... 
doi:10.1007/978-3-319-09967-5_14 fatcat:rybw6exgsnctrnwjylbntdvyi4

Combining analytical and empirical approaches in tuning matrix transposition

Qingda Lu, Sriram Krishnamoorthy, P. Sadayappan
2006 Proceedings of the 15th international conference on Parallel architectures and compilation techniques - PACT '06  
In this paper, we develop an integrated optimization framework that addresses a number of issues, including tiling for the memory hierarchy, effective handling of memory misalignment, utilizing memory  ...  The approach highlights aspects of empirical optimization that are important for similar computations with little temporal reuse.  ...  Acknowledgments We would like to thank the Ohio Supercomputer Center (OSC) for the use of their computing facilities.  ... 
doi:10.1145/1152154.1152190 dblp:conf/IEEEpact/LuKS06 fatcat:dtfss24gpfe7znuzk2hdpexcvy

Loop Optimization using Hierarchical Compilation and Kernel Decomposition

Denis Barthou, Sebastien Donadio, Patrick Carribault, Alexandre Duchateau, William Jalby
2007 International Symposium on Code Generation and Optimization (CGO'07)  
We propose a new hierarchical compilation approach for the generation of high performance code relying on the use of state-of-the-art compilers.  ...  The increasing complexity of hardware features for recent processors makes high performance code generation very challenging.  ...  For the evaluation of kernels, two key parameters are explored: • Loop bounds: they correspond to tile sizes.  ... 
doi:10.1109/cgo.2007.22 dblp:conf/cgo/BarthouDCDJ07 fatcat:vsxdunyikfapnngqas6ulauk5q

Profitable loop fusion and tiling using model-driven empirical search

Apan Qasem, Ken Kennedy
2006 Proceedings of the 20th annual international conference on Supercomputing - ICS '06  
Loop fusion and tiling are both recognized as effective transformations for improving memory performance of scientific applications.  ...  Our strategy consists of a detailed cost model that characterizes the interaction between the two transformations at different levels of the memory hierarchy.  ...  In some cases, improved cache performance comes at a cost of increased register pressure, while in other situations, reduced TLB misses might result in lost locality for some level of cache.  ... 
doi:10.1145/1183401.1183437 dblp:conf/ics/QasemK06 fatcat:puoomxvmwvfpjhwjtc5kjmnidy

Gemmini: Enabling Systematic Deep-Learning Architecture Evaluation via Full-Stack Integration [article]

Hasan Genc, Seah Kim, Alon Amid, Ameer Haj-Ali, Vighnesh Iyer, Pranav Prakash, Jerry Zhao, Daniel Grubb, Harrison Liew, Howard Mao, Albert Ou, Colin Schmidt (+7 others)
2021 arXiv   pre-print
DNN accelerators are often developed and evaluated in isolation without considering the cross-stack, system-level effects in real-world environments.  ...  This makes it difficult to appreciate the impact of System-on-Chip (SoC) resource contention, OS overheads, and programming-stack inefficiencies on overall performance/energy-efficiency.  ...  The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government.  ... 
arXiv:1911.09925v3 fatcat:yftbmax3c5dqtfvovhyz57oihy

Model-Guided Empirical Optimization for Multimedia Extension Architectures: A Case Study

Chun Chen, Jaewook Shin, Shiva Kintali, Jacqueline Chame, Mary Hall
2007 2007 IEEE International Parallel and Distributed Processing Symposium  
Compiler technology for multimedia extensions must effectively utilize not only the SIMD compute engines but also the various levels of the memory hierarchy: superword registers, multi-level caches and  ...  At the high-level, model-guided empirical optimization is used to transform code to optimize for all levels of the memory hierarchy.  ...  For each memory hierarchy level, a number of variants may be derived, and each such variant is processed when evaluating optimizations for the next level.  ... 
doi:10.1109/ipdps.2007.370641 dblp:conf/ipps/ChenSKCH07 fatcat:awdgu3nk45dwdcvxoejdavsfey

Urban Passive Cooling. Aging Effects on Optical Properties of Roof Tiles

Alchapar Noelia, Correa Erica, Cantón M. Alicia
2014 Energy Procedia  
This work evaluates the influence of wear on the thermal performance of diverse roof tiles available in the region of Mendoza, Argentina. 16 roof tiles of different characteristics (colour, shape, composition  ...  When the percentage of solar radiation absorbed by the material of the urban envelope is diminished, its surface temperature can be reduced, thus the heat level is minimized.  ...  The objective of this study is to evaluate the influence of wear in the modification of the thermal performance of diverse roof tiles.  ... 
doi:10.1016/j.egypro.2015.06.068 fatcat:djmmjhwajbc6pplkai4rhrbo7q

swSpTRSV

Xinliang Wang, Wei Xue, Weifeng Liu, Li Wu
2018 Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '18  
We propose a new data layout called Sparse Level Tile, or SLT for short, to divide a sparse matrix into two types of 2D tiles with nonuniformed shapes.  ...  This makes the performance of parallel SpTRSV through the level-set methods far from satisfactory.  ...  Acknowledgments The authors would like to thank all anonymous reviewers for their insightful comments and suggestions.  ... 
doi:10.1145/3178487.3178513 dblp:conf/ppopp/WangLXW18 fatcat:tgsa7oatxva5hlpu3mlbu5f7oe

Strategies for improving performance and energy efficiency on a many-core

Elkin Garcia, Guang Gao
2013 Proceedings of the ACM International Conference on Computing Frontiers - CF '13  
The research proposed here will provide an analysis of these new scenarios, proposing new methodologies and solutions that leverage these new challenges in order to increase the performance and energy  ...  New many-core architectures are characterized not only by the large amount of processing elements but also by the large number and heterogeneity of resources.  ...  ACKNOWLEDGEMENTS This work has been made possible by the generous support of the NSF through research grants CCF-0833122, CCF-0925863, CCF-0937907, CNS-0720531, and OCI-0904534.  ... 
doi:10.1145/2482767.2482779 dblp:conf/cf/GarciaG13 fatcat:n2syt3shvndjxazsdjyggacrka
« Previous Showing results 1 — 15 out of 22,570 results