Filters








1,256 Hits in 3.4 sec

Polyhedral-based data reuse optimization for configurable computing

Louis-Noel Pouchet, Peng Zhang, P. Sadayappan, Jason Cong
2013 Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays - FPGA '13  
Our framework effectively implements data reuse through aggressive loop transformation-based program restructuring.  ...  We leverage the power and expressiveness of the polyhedral compilation model to develop a multi-objective optimization system for off-chip communications management.  ...  Acknowledgment This work was supported by the Center for Domain-Specific Computing (CDSC) funded by NSF "Expeditions in Computing" award 0926127, and the Gigascale Systems Research Center (GSRC).  ... 
doi:10.1145/2435264.2435273 dblp:conf/fpga/PouchetZSC13 fatcat:zaashltg3bcw7pasjfjh6lep3e

A scalable auto-tuning framework for compiler optimization

Ananta Tiwari, Chun Chen, Jacqueline Chame, Mary Hall, Jeffrey K. Hollingsworth
2009 2009 IEEE International Symposium on Parallel & Distributed Processing  
Our search algorithm simultaneously evaluates different combinations of compiler optimizations and converges to solutions in only a few tens of search-steps.  ...  We describe a scalable and general-purpose framework for auto-tuning compiler-generated code.  ...  LeTSeE [17] is an iteration optimization tool based on the polyhedral model.  ... 
doi:10.1109/ipdps.2009.5161054 dblp:conf/ipps/TiwariCCHH09 fatcat:bhgee57vovb3tpu3csp6xaxopm

Stripe: Tensor Compilation via the Nested Polyhedral Model [article]

Tim Zerrell, Jeremy Bruestle
2019 arXiv   pre-print
Here we present a Nested Polyhedral Model for representing highly parallelizable computations with limited dependencies between iterations.  ...  Stripe represents parallelism, efficient memory layout, and multiple compute units at a level of abstraction amenable to automatic optimization.  ...  Particular thanks to Leona Cook for editing, and thanks to Madhur Amilkanthwar, Priya Arora, Mikaël Bourges-Sévenier, Cormac Brick, Diego Caballero, Namrata Choudhury, Rob Earhart, Frank Laub, Alessandro  ... 
arXiv:1903.06498v1 fatcat:feqgth6yxvdhxb652jv5vwwlea

AdaptMemBench: Application-Specific MemorySubsystem Benchmarking [article]

Mahesh Lakshminarasimhan, Catherine Olschanowsky
2018 arXiv   pre-print
Thisframework can explore the performance characteristics of a widerange of access patterns and can be used as a testbed for potentialoptimizations due to the flexibility of polyhedral code generation.We  ...  Optimizing scientific applications to take full advan-tage of modern memory subsystems is a continual challenge forapplication and compiler developers.  ...  Tiling Optimization for Jacobi transformations Rectangular space tiling [8] is one of the traditional optimization strategies for stencil computations.  ... 
arXiv:1812.07778v1 fatcat:53h5gupql5a2vnvcbsulphfyya

DRDU

Ilya Issenin, Erik Brockmeyer, Miguel Miranda, Nikil Dutt
2007 ACM Transactions on Design Automation of Electronic Systems  
set of buffers for local storage of frequently reused data.  ...  In this article we present an automated approach for analyzing these opportunities in a program that allows modification of the program to use custom scratch-pad memory configurations comprising a hierarchical  ...  DRDU: Data Reuse Analysis Algorithm and its Use for Memory Subsystem Optimization In this section we describe our approach for building a custom scratch-padmemory-based system using our data reuse analysis  ... 
doi:10.1145/1230800.1230807 fatcat:5vdyhfx7rbdq3mk7zjkbcygwei

PolyBench/Python: benchmarking Python environments with polyhedral optimizations

Miguel Á. Abella-González, Pedro Carollo-Fernández, Louis-Noël Pouchet, Fabrice Rastello, Gabriel Rodríguez
2021 Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction  
Polyhedral Optimizations NumPy: Loop-based vs.  ...  Stencil computations time-iterated times have their data reused ( ) times.  ... 
doi:10.1145/3446804.3446842 fatcat:nb5t5lox3fhfjbgbrmh6t7usri

Nested-Loops Tiling for Parallelization and Locality Optimization

Saeed Parsa, Mohammad Hamzei
2017 Computing and informatics  
Data locality improvement and nested loops parallelization are two complementary and competing approaches for optimizing loop nests that constitute a large portion of computation times in scientific and  ...  Furthermore, tiles will be scheduled on processor cores to exploit maximum data reuse through scheduling tiles with high volume of data sharing on the same core consecutively or on different cores with  ...  In the proposed framework, a unified step by step approach for nested loops parallelization along with data locality optimization is applied. • A novel algebraic-based method to compute suitable transformation  ... 
doi:10.4149/cai_2017_3_566 fatcat:m2v4wqrfb5erjahbsuddupw26m

CMOST

Peng Zhang, Muhuan Huang, Bingjun Xiao, Hui Huang, Jason Cong
2015 Proceedings of the 52nd Annual Design Automation Conference on - DAC '15  
We also present several novel techniques on integrating optimizations in CMOST, including task-level dependence analysis, block-based data streaming, and automated SDF generation.  ...  CMOST establishes a unified framework for the integration of various system-level optimizations and for different hardware platforms.  ...  This research is partially supported by the NSF Expeditions in Computing Award CCF-0926127.  ... 
doi:10.1145/2744769.2744807 dblp:conf/dac/ZhangHXHC15 fatcat:b2i7ojxmmjdwffcazibpuxvhym

Data Reuse Buffer Synthesis Using the Polyhedral Model

Wim Meeus, Dirk Stroobandt
2018 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
Current high-level synthesis (HLS) tools for the automatic design of computing hardware perform excellently for the synthesis of computation kernels, but they often do not optimize memory bandwidth.  ...  Throughout this paper, the polyhedral representation is used extensively as it proves to be well suited for calculations on loop nests and data accesses.  ...  [10] for sharing their benchmarks.  ... 
doi:10.1109/tvlsi.2018.2817159 fatcat:mqxq6dbh2bgcfljl3r2sqfwppm

Hardware Acceleration of Sparse and Irregular Tensor Computations of ML Models: A Survey and Insights [article]

Shail Dave, Riyadh Baghdadi, Tony Nowatzki, Sasikanth Avancha, Aviral Shrivastava, Baoxin Li
2021 arXiv   pre-print
their efficient computations; analyzing trade-offs in opting for a specific design choice for encoding, storing, extracting, communicating, computing, and load-balancing the non-zeros; understanding how  ...  recent DNNs; highlights further opportunities in terms of hardware/software/model co-design optimizations (inter/intra-module).  ...  While interval-based compilers can also implement non-polyhedral dependence analysis (by computing dependence distance vectors [227] ), it is not as precise as polyhedral dependence analysis [226] .  ... 
arXiv:2007.00864v2 fatcat:k4o2xboh4vbudadfiriiwjp7uu

Locality-Conscious Nested-Loops Parallelization

Saeed Parsa
2014 ETRI Journal  
These groups can be further tiled to improve data locality through exploiting data reuse in multiple dimensions.  ...  Effective parallelization techniques distribute the computation and necessary data across different processors, whereas data locality places data on the same processor.  ...  A majority of the related works on data locality optimization is based on loop transformations.  ... 
doi:10.4218/etrij.13.0113.0266 fatcat:ibwilepb3rexzpeg2rdpvyphvq

Locality-Conscious Nested-Loops Parallelization

Saeed Parsa, Mohammad Hamzei
2014 ETRI Journal  
These groups can be further tiled to improve data locality through exploiting data reuse in multiple dimensions.  ...  Effective parallelization techniques distribute the computation and necessary data across different processors, whereas data locality places data on the same processor.  ...  A majority of the related works on data locality optimization is based on loop transformations.  ... 
doi:10.4218/etrij.14.0113.0266 fatcat:53npwvnoznbg3pqtev3y36rwci

The Deep Learning Compiler: A Comprehensive Survey [article]

Mingzhen Li, Yi Liu, Xiaoyan Liu, Qingxiao Sun, Xin You, Hailong Yang, Zhongzhi Luan, Lin Gan, Guangwen Yang, Depei Qian
2020 arXiv   pre-print
Similarly, the DL compilers take the DL models described in different DL frameworks as input, and then generate optimized codes for diverse DL hardware as output.  ...  This is the first survey paper focusing on the design architecture of DL compilers, which we hope can pave the road for future research towards DL compiler.  ...  ACKNOWLEDGEMENTS The authors would like to thank Jun Yang from Alibaba, Yu Xing from Xilinx, and Byung Hoon Ahn from UCSD for their valuable comments and suggestions.  ... 
arXiv:2002.03794v4 fatcat:owj6qygxhrhxjag5ifam65vhja

Simplifying Multiple-Statement Reductions with the Polyhedral Model [article]

Cambridge Yang, Eric Atkinson, Michael Carbin
2020 arXiv   pre-print
Contemporary polyhedral-based compilation techniques make it possible to optimize reductions, such as prefix sum, in which each component of the reduction's output potentially shares computation with another  ...  We present a heuristic optimization algorithm for these reductions, and we demonstrate that the algorithm provides optimal complexity for a set of benchmark programs from the literature on probabilistic  ...  Configuration of Simplification Transformation A fully automated optimizing compiler should automatically identify a reuse direction ì r and apply ST.  ... 
arXiv:2007.11203v1 fatcat:tnlgrpfd5vcvbn33rxfw4mgfta

Simplifying dependent reductions in the polyhedral model

Cambridge Yang, Eric Atkinson, Michael Carbin
2021 Proceedings of the ACM on Programming Languages (PACMPL)  
Contemporary polyhedral-based compilation techniques make it possible to optimize reductions, such as prefix sums, in which each component of the reduction's output potentially shares computation with  ...  The complexities for 10 of the 11 programs improve siginifcantly by factors at least of the sizes of the input data, which are in the range of 10 4 to 10 6 for typical real application inputs.  ...  ACKNOWLEDGMENTS We would like to thank Alex Renda, Charith Mandis, Jesse Michel, Jonathan Frankle, Riyadh Baghdadi, Sanjay Rajopadhye, Sriram Krishnamoorthy, Tian Jin, and anonymous reviewers for their  ... 
doi:10.1145/3434301 fatcat:kuhbvwgtkbhdppvejltqukw5we
« Previous Showing results 1 — 15 out of 1,256 results