A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Accelerating Distributed-Memory Autotuning via Statistical Analysis of Execution Paths
[article]
2021
arXiv
pre-print
The prohibitive expense of automatic performance tuning at scale has largely limited the use of autotuning to libraries for shared-memory and GPU architectures. We introduce a framework for approximate autotuning that achieves a desired confidence in each algorithm configuration's performance by constructing confidence intervals to describe the performance of individual kernels (subroutines of benchmarked programs). Once a kernel's performance is deemed sufficiently predictable for a set of
arXiv:2103.01304v1
fatcat:64gtczdht5fqrhbonxcs7ikc34