Filters








26 Hits in 8.9 sec

CMOST

Peng Zhang, Muhuan Huang, Bingjun Xiao, Hui Huang, Jason Cong
2015 Proceedings of the 52nd Annual Design Automation Conference on - DAC '15  
CMOST establishes a unified framework for the integration of various system-level optimizations and for different hardware platforms.  ...  Programming difficulty is a key challenge to the adoption of FPGAs as a general high-performance computing platform.  ...  ACKNOWLEDGEMENTS The authors would like to thank Young-Kyu Choi, Hassan Kianinejad, Jie Lei, Peng Li, Jie Wang, and Yuxin Wang for the efforts in CMOST development and design case study.  ... 
doi:10.1145/2744769.2744807 dblp:conf/dac/ZhangHXHC15 fatcat:b2i7ojxmmjdwffcazibpuxvhym

Improving polyhedral code generation for high-level synthesis

Wei Zuo, Peng Li, Deming Chen, Louis-Noel Pouchet, Shunan Zhong, Jason Cong
2013 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)  
High-level synthesis (HLS) tools are now capable of generating highquality RTL codes for a number of programs.  ...  The polyhedral compilation framework has shown great promise in this area with the development of HLS-specific polyhedral transformation techniques and tools.  ...  We are grateful to Xilinx, Inc. for equipment donations and financial contributions.  ... 
doi:10.1109/codes-isss.2013.6659002 dblp:conf/codes/ZuoLCPZC13 fatcat:4e5bazdlfbe7dgu2ullbkzp7uu

Smart-Cache: Optimising Memory Accesses for Arbitrary Boundaries and Stencils on FPGAs

Syed Waqar Nabi, Wim Vanderbauwhede
2019 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
The architecture is complemented by a formal model for determining buffer configuration. We propose a hybrid use of the block and distributed RAM on the FPGA.  ...  To address this problem, we present Smache, a novel smartcaching framework that uses FPGA on-chip memory resources for optimising access for arbitrary stencil shapes and boundary conditions.  ...  Reference [9] present a polyhedral model-based framework for iterative stencil loops. The use of OpenCL based HLS approach for stencil computation was discussed in [10] .  ... 
doi:10.1109/ipdpsw.2019.00024 dblp:conf/ipps/NabiV19 fatcat:24gwvhgmujgn5j5yca2h5f6h6q

Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures [article]

Tal Ben-Nun, Johannes de Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, Torsten Hoefler
2020 arXiv   pre-print
We demonstrate SDFGs on CPUs, GPUs, and FPGAs over various motifs --- from fundamental computational kernels to graph analytics.  ...  We present the Stateful DataFlow multiGraph (SDFG), a data-centric intermediate representation that enables separating program definition from its optimization.  ...  ] corresponding library calls for CPU; NVIDIA CUBLAS • Jacobi Stencil: A 5-point stencil repeatedly computed on a 2,048 square 2D domain for T =1,024 iterations, with constant (zero) boundary conditions  ... 
arXiv:1902.10345v3 fatcat:4aerjkgf2fguhlbcbrw7g2uw5e

Automated accelerator generation and optimization with composable, parallel and pipeline architecture

Jason Cong, Peng Wei, Cody Hao Yu, Peng Zhang
2018 Proceedings of the 55th Annual Design Automation Conference on - DAC '18  
AutoAccel accepts a software program as an input and performs a series of code transformations based on the result of the analytical-model-based design space exploration to construct the desired CPP microarchitecture  ...  On top of the CPP microarchitecture and its analytical model, we develop the AutoAccel framework to make the entire accelerator generation automated.  ...  In this paper we propose the AutoAccel framework to provide a nearly push-button experience on mapping C functions into high-quality FPGA accelerator designs.  ... 
doi:10.1145/3195970.3195999 dblp:conf/dac/CongWYZ18 fatcat:ezisbhayq5hlxfoko437bljdne

Lifting C Semantics for Dataflow Optimization [article]

Alexandru Calotoiu, Tal Ben-Nun, Grzegorz Kwasniewski, Johannes de Fine Licht, Timo Schneider, Philipp Schaad, Torsten Hoefler
2021 arXiv   pre-print
We separate writing code from optimizing for different hardware: simple, portable C source code is used to generate efficient specialized versions with a click of a button.  ...  C is the lingua franca of programming and almost any device can be programmed using C.  ...  provides a domain-specific framework for the optimization SYCL [2], a cross-platform abstraction layer, provides port- of stencil/image-processing kernels, which aims to make able performance by  ... 
arXiv:2112.11879v2 fatcat:sphukfn3xjgetl7upxikb3fgke

Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code [article]

Riyadh Baghdadi, Jessica Ray, Malek Ben Romdhane, Emanuele Del Sozzo, Abdurrahman Akkas, Yunming Zhang, Patricia Suriana, Shoaib Kamil, Saman Amarasinghe
2018 arXiv   pre-print
Tiramisu has two main features: it relies on a flexible representation based on the polyhedral model and it has a rich scheduling language allowing fine-grained control of optimizations.  ...  The framework is designed for the areas of image processing, stencils, linear algebra and deep learning.  ...  ACKNOWLEDGEMENTS This work was supported by the ADA Research Center, a JUMP Center co-sponsored by SRC and DARPA.  ... 
arXiv:1804.10694v5 fatcat:v6i7euoftjd43nxeh6jogxwz7u

Towards Automatic High-Level Code Deployment on Reconfigurable Platforms: A Survey of High-Level Synthesis Tools and Toolchains

Mostafa W. Numan, Braden J. Phillips, Gavin S. Puddy, Katrina Falkner
2020 IEEE Access  
A Halide inspired DSL language, Rigel [100] , is another framework for implementing optimised image processing accelerators on FPGAs, and is based on the Darkroom framework.  ...  for efficient FPGA implementation of these loop-nests.  ... 
doi:10.1109/access.2020.3024098 fatcat:hk7s2deq6zgp5fnuwvm5k6jodu

DNN Dataflow Choice Is Overrated [article]

Xuan Yang, Mingyu Gao, Jing Pu, Ankita Nayak, Qiaoyi Liu, Steven Emberton Bell, Jeff Ou Setter, Kaidi Cao, Heonjae Ha, Christos Kozyrakis, Mark Horowitz
2018 arXiv   pre-print
Based on these observations, we develop an optimizer that automatically finds the optimal blocking and storage hierarchy.  ...  However, finding the best blocking and resource allocation is critical, and we achieve a 2.6X energy savings over Eyeriss system by reducing the size of the local register file.  ...  For each dataflow, the loop blocking scheme is optimized to minimize the energy based on the analysis framework in Section V-A, and the utilization ratio is constrained to be higher than 75%.  ... 
arXiv:1809.04070v1 fatcat:dtrnyaf6sfcjhfaiq4duvaie4i

From Domain-Specific Languages to Memory-Optimized Accelerators for Fluid Dynamics [article]

Karl F. A. Friebel, Stephanie Soldavini, Gerald Hempel, Christian Pilato, Jeronimo Castrillon
2021 arXiv   pre-print
We propose an automated tool flow from a domain-specific language (DSL) to generate accelerators for computational fluid dynamics on FPGA.  ...  Most of these numerical algorithms are massively parallel and often implemented on parallel high-performance computers.  ...  The CPU host code executes the accelerator for the total number of elements in the CFD simulation (N e ), requiring N e /m main loop iterations.  ... 
arXiv:2108.03326v1 fatcat:ekqskqmj7nfk5bxfzqyzq6fzri

AutoDSE: Enabling Software Programmers to Design Efficient FPGA Accelerators [article]

Atefeh Sohrabizadeh, Cody Hao Yu, Min Gao, Jason Cong
2021 arXiv   pre-print
While many learning models have been leveraged by existing work to automate the design of efficient accelerators, the unpredictability of modern HLS tools becomes a major obstacle for them to maintain  ...  Adopting FPGA as an accelerator in datacenters is becoming mainstream for customized computing, but the fact that FPGAs are hard to program creates a steep learning curve for software programmers.  ...  Peichen Pan for his invaluable support with the Merlin Compiler and Dr. Lorenzo Ferretti and Qi Sun for helping with the comparison to their work.  ... 
arXiv:2009.14381v2 fatcat:d4ynz74pubd4pnpko2nvkowqke

HARDWARE-AWARE TILING OPTIMIZATION FOR MULTI-CORE SYSTEMS

Dominik Adamski, Grzegorz Jabłoński
2017 Computer Science  
, Pop A., Pouchet L.N., Govindarajan R., Cohen A., Sadayappan P.: Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs.  ...  Loop boundary conditions are modeled as linear functions that limit iteration space. The dimension of iteration space is equal to the number of nested loops.  ... 
doi:10.7494/csci.2017.18.2.145 fatcat:vz2p74icxfg4vnciy2imq4aimi

JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization [article]

Kazuaki Matsumura, Simon Garcia De Gonzalo, Antonio J. Peña
2021 arXiv   pre-print
The rapid development in computing technology has paved the way for directive-based programming models towards a principal role in maintaining software portability of performance-critical applications.  ...  Efforts on such models involve a least engineering cost for enabling computational acceleration on multiple architectures while programmers are only required to add meta information upon sequential code  ...  Based on Euler's method, 2D stencil grids are updated by each kernel having minimum control logics and halo accesses through double bu ers for avoiding dependencies among loop iterations.  ... 
arXiv:2110.14340v1 fatcat:acfa6g7xm5dyfajen7fqkn4yri

Have abstraction and eat performance, too: optimized heterogeneous computing with parallel patterns

Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Arvind K. Sujeeth, Christopher De Sa, Christopher Aberger, Kunle Olukotun
2016 Proceedings of the 2016 International Symposium on Code Generation and Optimization - CGO 2016  
optimize for heterogeneous devices.  ...  To optimize distributed applications both for modern hardware and for modern programmers we need a programming model that is sufficiently expressive to support a variety of parallel applications, sufficiently  ...  Acknowledgments We are grateful to the anonymous reviewers for their comments and suggestions.  ... 
doi:10.1145/2854038.2854042 dblp:conf/cgo/BrownLRSSAO16 fatcat:cye5j5gi3vfgzh7xyku5cfttq4

Verification of Polyhedral Optimizations with Constant Loop Bounds in Finite State Space Computations [chapter]

Markus Schordan, Pei-Hung Lin, Dan Quinlan, Louis-Noël Pouchet
2014 Lecture Notes in Computer Science  
We focus on scientific kernels and a state-of-the-art polyhedral compiler implemented in ROSE.  ...  We present a framework to verify if two programs (one possibly being a transformed variant of the other) are semantically equivalent.  ...  PolyOpt/C performs data dependence analysis, loop transformation and code generation based on the polyhedral model.  ... 
doi:10.1007/978-3-662-45231-8_41 fatcat:h6ts7yxq5namhakdtdpwcfc24y
« Previous Showing results 1 — 15 out of 26 results