Filters








2,643 Hits in 7.5 sec

Schedule Synthesis for Halide Pipelines on GPUs

Savvas Sioutas, Sander Stuijk, Twan Basten, Henk Corporaal, Lou Somers
2020 ACM Transactions on Architecture and Code Optimization (TACO)  
The Halide DSL and compiler have enabled high-performance code generation for image processing pipelines targeting heterogeneous architectures through the separation of algorithmic description and optimization  ...  As a result, expert knowledge is still required when optimizing for platforms with GPU capabilities.  ...  We integrated our model into the Halide autoscheduler and tested it on a variety of image processing pipelines.  ... 
doi:10.1145/3406117 fatcat:wqtxe4g7hnc6lcwgjbuhfirnk4

Schedule Synthesis for Halide Pipelines through Reuse Analysis

Savvas Sioutas, Sander Stuijk, Luc Waeijen, Twan Basten, Henk Corporaal, Lou Somers
2019 ACM Transactions on Architecture and Code Optimization (TACO)  
The inherently complex structure found in most image-processing pipelines, the plethora of transformations that can be applied to optimize the performance of an implementation, as well as the interaction  ...  We propose a novel optimization strategy that aims to maximize producer-consumer locality by exploiting reuse in image-processing pipelines.  ...  In this article, we present a novel optimization strategy for image-processing pipelines that considers stage fusion for maximum producer/consumer locality in conjunction with tile size selection while  ... 
doi:10.1145/3310248 fatcat:mw6onmuhevgkxjla5cugx2cu5u

Flextended Tiles

Jie Zhao, Albert Cohen
2019 ACM Transactions on Architecture and Code Optimization (TACO)  
Loop tiling to exploit data locality and parallelism plays an essential role in a variety of general-purpose and domain-specific compilers.  ...  loop-nest optimizers and the fair comparison with other techniques.  ...  We would like to thank Michael Kruse, Chandan Reddy, Sven Verdoolaege and Oleksandr Zinenko for their tremendous support and valuable suggestions.  ... 
doi:10.1145/3369382 fatcat:rofcrtjlbre2zpoilctdvbklge

Halide

Jonathan Ragan-Kelley, Andrew Adams, Dillon Sharlet, Connelly Barnes, Sylvain Paris, Marc Levoy, Saman Amarasinghe, Frédo Durand
2017 Communications of the ACM  
As a result, the performance difference between a naïve implementation of a pipeline and one globally optimized for parallelism and locality is often an order of magnitude.  ...  This is especially true for image processing pipelines, where individual stages do much too little work to amortize the cost of loading and storing results to and from off-chip memory.  ...  Most notably, Zalman Stern, Steven Johnson, and Patricia Suriana are full-time developers on the project at Google and are responsible for a large amount of the current code.  ... 
doi:10.1145/3150211 fatcat:4vhmxunjofam7daaaeiw5ssc7a

Halide

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, Saman Amarasinghe
2013 Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation - PLDI '13  
We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline  ...  , and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule.  ...  This work was supported by DOE Award DE-SC0005288, NSF grant 0964004, grants from Intel and Quanta, and gifts from Cognex and Adobe.  ... 
doi:10.1145/2491956.2462176 dblp:conf/pldi/Ragan-KelleyBAPDA13 fatcat:tr3fzvh5arbbbo4nn2iqpivdaa

Halide

Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, Saman Amarasinghe
2013 SIGPLAN notices  
We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline  ...  , and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule.  ...  This work was supported by DOE Award DE-SC0005288, NSF grant 0964004, grants from Intel and Quanta, and gifts from Cognex and Adobe.  ... 
doi:10.1145/2499370.2462176 fatcat:afs2mud2unentdmcazyg2qhiqq

PolyMage

Ravi Teja Mullapudi, Vinay Vasista, Uday Bondhugula
2015 SIGPLAN notices  
To the best of our knowledge, this is the first model-driven compiler for image processing pipelines that performs complex fusion, tiling, and storage optimization automatically.  ...  An image processing pipeline can be viewed as a graph of interconnected stages which process images successively.  ...  Acknowledgments We gratefully acknowledge the authors of Halide for developing and actively maintaining Halide as an open-source project.  ... 
doi:10.1145/2775054.2694364 fatcat:fsc5rtnddrhjvei73m6twz5mty

Training of deep learning pipelines on memory-constrained GPUs via segmented fused-tiled execution

Yufan Xu, Saurabh Raje, Atanas Rountev, Gerald Sabin, Aravind Sukumaran-Rajam, P. Sadayappan
2022 Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction  
Therefore, existing Deep Learning pipelines for these use cases have been forced to develop sub-optimal "patch-based" modeling approaches, where images are processed in small segments of an image.  ...  In this paper, we present a solution to this problem by employing tiling in conjunction with check-pointing, thereby enabling arbitrarily large images to be directly processed, irrespective of the size  ...  For 10kx10k image, we can use 16x16 tiles, and for 20kx20k image, we can use 32x32 tiles(See uu/benchmarking/large.sh).  ... 
doi:10.1145/3497776.3517766 pmid:35876769 pmcid:PMC9302555 fatcat:4ghjtvtwtvg7tevoedi2wmgn4e

Decoupling algorithms from schedules for easy optimization of image processing pipelines

Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, Frédo Durand
2012 ACM Transactions on Graphics  
We demonstrate the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide, and compiling them for ARM, x86, and  ...  We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism.  ...  Acknowledgments This work was partially funded by the Quanta T-Party, NSF grants 0964004, 0964218, and 0832997, DOE award DE-SC0005288, and gifts from Cognex and Adobe.  ... 
doi:10.1145/2185520.2185528 fatcat:mdfbhjwc5zatfoq4mt6p3j7m7a

Decoupling algorithms from schedules for easy optimization of image processing pipelines

Jonathan Ragan-Kelley, Andrew Adams, Sylvain Paris, Marc Levoy, Saman Amarasinghe, Frédo Durand
2012 ACM Transactions on Graphics  
We demonstrate the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide, and compiling them for ARM, x86, and  ...  We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism.  ...  Acknowledgments This work was partially funded by the Quanta T-Party, NSF grants 0964004, 0964218, and 0832997, DOE award DE-SC0005288, and gifts from Cognex and Adobe.  ... 
doi:10.1145/2185520.2335383 fatcat:smf4xdhjpzhn3ggl2qyilol22y

A Survey on System-Level Design of Neural Network Accelerators

Kenshu Seto
2021 Journal of Integrated Circuits and Systems  
For the nested loop of convolutional (CONV) layers, we discuss the effects of loop optimizations such as loop interchange, tiling, unrolling and fusion on CNN accelerators.  ...  Optimizations for CNN models are briefly explained, followed by the recent trends and future directions of the CNN accelerator design.  ...  Typically, loop fusion is effective for fusing loops with the similar loop bounds.  ... 
doi:10.29292/jics.v16i2.505 fatcat:ibbkeob42jepbguezlptws2qha

Oil and Water Can Mix: An Integration of Polyhedral and AST-Based Transformations

Jun Shirako, Louis-Noel Pouchet, Vivek Sarkar
2014 SC14: International Conference for High Performance Computing, Networking, Storage and Analysis  
In addition, several key transformations such as pipeline-parallelism and unroll-and-jam are difficult to express in the polyhedral framework.  ...  However, it usually implements abstract optimization objectives, for example "maximize data reuse", which often does not deliver best performance, e.g., the complex loop structures generated can be detrimental  ...  ACKNOWLEDGMENT This work for was supported in part by the Center for Domain Specific Computing (NSF Expeditions in Computing Award CCF-0926127).  ... 
doi:10.1109/sc.2014.29 dblp:conf/sc/ShirakoPS14 fatcat:lgdqaqc2ibcknnefuum5sfttq4

Automatically scheduling halide image processing pipelines

Ravi Teja Mullapudi, Andrew Adams, Dillon Sharlet, Jonathan Ragan-Kelley, Kayvon Fatahalian
2016 ACM Transactions on Graphics  
The Halide image processing language has proven to be an effective system for authoring high-performance image processing code.  ...  Unfortunately, designing high-performance schedules for complex image processing pipelines requires substantial knowledge of modern hardware architecture and code-optimization techniques.  ...  NVIDIA), and by equipment donations from NVIDIA.  ... 
doi:10.1145/2897824.2925952 fatcat:nr2o5mqsmncutdyxiuzrnont5e

DNNVM : End-to-End Compiler Leveraging Heterogeneous Optimizations on FPGA-based CNN Accelerators [article]

Yu Xing, Shuang Liang, Lingzhi Sui, Xijie Jia, Jiantao Qiu, Xin Liu, Yushun Wang, Yu Wang, Yi Shan
2019 arXiv   pre-print
DNNVM enumerates all potentially profitable fusion opportunities by a heuristic subgraph isomorphism algorithm to leverage pipeline and data layout optimizations, and searches for the best choice of execution  ...  We propose the full-stack compiler DNNVM, which is an integration of optimizers for graphs, loops and data layouts, and an assembler, a runtime supporter and a validation environment.  ...  In the future, a mathematical model-driven method which automatically decides the sizes for slicing and fusion needs to be further explored.  ... 
arXiv:1902.07463v2 fatcat:5kwsfakmojht5nn2qj6uuuwwsy

Forma: a DSL for image processing applications to target GPUs and multi-core CPUs

Mahesh Ravishankar, Justin Holewinski, Vinod Grover
2015 Proceedings of the 8th Workshop on General Purpose Processing using GPUs - GPGPU 2015  
Here we present Forma, a DSL for image processing applications that targets both CPUs and GPUs.  ...  Such an integration would allow users of such tools to develop efficient implementations easily.  ...  Sadayappan from Ohio State University for his comments. Finally, we thank the reviewers of this paper for their helpful comments regarding related work and possible enhancements.  ... 
doi:10.1145/2716282.2716290 dblp:conf/ppopp/RavishankarHG15 fatcat:cwggdu43ufczdjtiz4bluw7bba
« Previous Showing results 1 — 15 out of 2,643 results