8 Hits in 2.6 sec

Programming Matrices as Staged Sparse Rows to Generate Efficient Matrix-free Differential Equation Solver [article]

Huanqi Cao, Shizhi Tang, Bowen Yu, Wenguang Chen
2022 arXiv   pre-print
Implementing BT pseudo application in the NAS Parallel Benchmark, with less than 10% lines of code compared with the matrix-free reference FORTRAN implementation, we achieved up to 92.8% performance.  ...  This paper points out that sparse matrices can be represented as programs instead of data, having both the generality from the matrix-based representation and the performance from program optimizations  ...  The branches will be eliminated by the optimizations we will discuss in Section 4.  ... 
arXiv:2204.13304v1 fatcat:vuz7vjuys5fnfebdnf6vcgwaxu

On the Transformation Optimization for Stencil Computation

Huayou Su, Kaifang Zhang, Songzhu Mei
2021 Electronics  
The recipes consist of loop unrolling, loop fusion, address precalculation, redundancy elimination, instruction reordering, load balance, and a forward and backward update algorithm named semi-stencil.  ...  In this paper, we combine the two aspects to study the potential benefits some common transformation recipes may have for stencils.  ...  The general redundancy elimination described in this section is mainly aiming at the actual computation of the generated kernel and solves the redundancy within a single loop iteration and between loop  ... 
doi:10.3390/electronics11010038 fatcat:t4pz3dgomzbujjjcrlookgeumi

Automated generation of High-Performance Computational Fluid Dynamics Codes

Sandra Macià, Pedro J. Martínez-Ferrer, Eduard Ayguade, Vicenç Beltran
2022 Journal of Computational Science  
This paper presents the automated process of generating, from abstract mathematical specifications of Computational Fluid Dynamics (CFD) problems, optimised parallel codes that perform and scale as manually  ...  Our results demonstrate how high-level DSLs can offer competitive performance by transparently leveraging state-of-the-art HPC techniques.  ...  Acknowledgment This research has received funding from the European Union's Horizon 2020/EuroHPC research and innovation programme under grant agreement N.955606 (DEEP-SEA), and is supported by the Spanish  ... 
doi:10.1016/j.jocs.2022.101664 fatcat:7z54kpriyzfdfjlzuam55ayfmm


Thomas Debrunner, Sajad Saeedi, Paul H. J. Kelly
2019 ACM Transactions on Architecture and Code Optimization (TACO)  
In this paper we explore code generation for an FPSP whose 256 × 256 processors operate on analogue signal data, leading to further opportunities for power reduction -and additional code synthesis challenges  ...  This paper presents a code generator for convolution filters for the SCAMP-5 FPSP, with applications in many high level tasks such as convolutional neural networks, pose estimation etc.  ...  We thank Piotr Dudek and his colleagues at Manchester University for kindly providing access to the SCAMP device and simulator, and Fabio Luporini for helpful comments on the manuscript.  ... 
doi:10.1145/3291055 fatcat:hxrh7jvmqjddtkgi3xyai3puly

Generating Block-Structured Kernels for Low Order Finite Element Methods

Marcel Koch, Christian Engwer, Harald Köstler
2021 Zenodo  
The presented approaches are implemented as part of the code generation framework Dune-Codegen to ease the usage of the optimizations.  ...  By generating the necessary kernels, the same performance as for handwritten implementations can be reached.  ...  Code Generation Framework The code generation pipeline used in this thesis is implemented in the DUNECODEGEN framework, which was established in [54] and [55] .  ... 
doi:10.5281/zenodo.4704642 fatcat:ynsfwpzd4rhrzp5fvzbiisfxcy

Abstractions and performance optimisations for finite element methods

Tianjiao Sun, Paul Kelly, Engineering And Physical Sciences Research Council
In designing software tools for this task, one of the ultimate goals is to balance the needs for generality, ease to use and high performance.  ...  Domain-specific systems based on code generation techniques, such as Firedrake, attempt to address this problem with a design consisting of a hierarchy of abstractions, where the users can specify the  ...  They lead the research endeavour with such high standards and integrity that I am sure will benefit me for years to come, and for that, I am deeply grateful.  ... 
doi:10.25560/95186 fatcat:bc4x4fjba5d7vmzfqjwbyvegxq

Code Generation for High Performance PDE Solvers on Modern Architectures

Dominic Kempf
The space of these vectorization strategies is explored systematically from within the code generator in an autotuning procedure.  ...  General purpose compilers are not capable of autovectorizing traditional PDE simulation codes, requiring high performance implementations to explicitly spell out SIMD instructions.  ...  We observe that strategies based on kernel fusion perform scrictly better in terms of DOFs throughput.  ... 
doi:10.11588/heidok.00027360 fatcat:sjn764xlsbcy7krvcw4so24q5q

2015 Jahresbericht Annual Report

Registernummer Amtsgericht, Saarbrücken Hrb, Vorsitzender Des Aufsichtsrates, Ing, Jähnichen Stefan, Raimund Geschäftsführung, Seidel, Meißner Heike, Gesellschafter
Martin Rinard for his support and contribution to the organization of the seminar. We thank Sara Achour for her help with preparing the full report.  ...  The organizers would like to express their gratitude to the participants and the Schloss Dagstuhl team for a productive and exciting seminar. We thank Prof.  ...  and redundant links, and problems in trace granularity.  ...