Filters








50 Hits in 5.5 sec

Sparse Fourier Transform by traversing Cooley-Tukey FFT computation graphs [article]

Karl Bringmann, Michael Kapralov, Mikhail Makarov, Vasileios Nakos, Amir Yagudin, Amir Zandieh
2021 arXiv   pre-print
Our robust algorithm can be viewed as a highly optimized sparse, stable extension of the Cooley-Tukey FFT algorithm.  ...  In the Sparse Fast Fourier Transform (Sparse FFT) problem, one is given oracle access to a d-dimensional vector x of size N, and is asked to compute the best k-term approximation of its Discrete Fourier  ...  Our techniques: new methods for traversing pruned Cooley-Tukey FFT computation graphs.  ... 
arXiv:2107.07347v1 fatcat:p735ty2nt5hefizqbhqips7r3u

Generating Optimized Fourier Interpolation Routines for Density Functional Theory Using SPIRAL

Doru Thom Popovici, Francis P. Russell, Karl Wilkinson, Chris-Kriton Skylaris, Paul H. J. Kelly, Franz Franchetti
2015 2015 IEEE International Parallel and Distributed Processing Symposium  
using a frequency domain Fourier property can be a good choice.  ...  For small upsampling factors as seen in the quantum chemistry code ONETEP, a time-shift based implementation that shifts samples by a fraction of the original grid spacing to fill in the intermediate values  ...  We now derive the loopable short vector Cooley Tukey FFT for odd sizes.  ... 
doi:10.1109/ipdps.2015.112 dblp:conf/ipps/PopoviciRWSKF15 fatcat:fwhzbligiraerpnzcivtt6lihe

Bandit-based optimization on graphs with application to library performance tuning

Frédéric de Mesmay, Arpad Rimmel, Yevgen Voronenko, Markus Püschel
2009 Proceedings of the 26th Annual International Conference on Machine Learning - ICML '09  
The problem of choosing fast implementations for a class of recursive algorithms such as the fast Fourier transforms can be formulated as an optimization problem over the language generated by a suitably  ...  We propose a novel algorithm that solves this problem by reducing it to maximizing an objective function over the sinks of a directed acyclic graph.  ...  This work was supported by NSF through awards 0325687, 0702386, by DARPA (DOI grant NBCH-1050009), the ARO grant W911NF0710416, and by Intel.  ... 
doi:10.1145/1553374.1553468 dblp:conf/icml/MesmayRVP09 fatcat:mmux23to5nbvpppjopbf5zxtaa

Scheduling FFT computation on SMP and multicore systems

Ayaz Ali, Lennart Johnsson, Jaspal Subhlok
2007 Proceedings of the 21st annual international conference on Supercomputing - ICS '07  
We have developed a portable framework for the Fast Fourier Transform (FFT) that achieves high efficiency by automatically adapting to various architectural features.  ...  In this paper, we develop heuristics to simplify the generation of better schedules for parallel FFT computations on CMP/SMP systems.  ...  BACKGROUND FFT The Fast Fourier Transform (FFT) is a divide and conquer algorithm for quick evaluation of the Discrete Fourier Transform (DFT).  ... 
doi:10.1145/1274971.1275011 dblp:conf/ics/AliJS07 fatcat:ngkx3wztgzanvgg3ybakgh4bva

Indigo: A Domain-Specific Language for Fast, Portable Image Reconstruction

Michael Driscoll, Benjamin Brock, Frank Ong, Jonathan Tamir, Hsiou-Yuan Liu, Michael Lustig, Armando Fox, Katherine Yelick
2018 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)  
Linear operators used in iterative methods like conjugate gradient have typically been implemented either as "matrix-driven" subroutines backed by explicit sparse or dense matrices, or as "matrix-free"  ...  This representation enables expert-guided reordering and fusion transformations that can improve performance or reduce memory pressure.  ...  ACKNOWLEDGMENTS This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S.  ... 
doi:10.1109/ipdps.2018.00059 dblp:conf/ipps/DriscollBOTLLFY18 fatcat:o5alcooghbcrfmnlxntj5u5yf4

A framework for low-communication 1-D FFT

Ping Tak Peter Tang, Jongsoo Park, Daehyun Kim, Vladimir Petrov
2012 2012 International Conference for High Performance Computing, Networking, Storage and Analysis  
For large-scale problems, our implementation can be twice as fast as leading FFT libraries on state-of-the-art computer clusters.  ...  In this paper, we present a mathematical framework from which many single-all-to-all and easy-to-implement 1-D FFT algorithms can be derived.  ...  Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions.  ... 
doi:10.1109/sc.2012.5 dblp:conf/sc/TangPKP12 fatcat:xolo6xtra5fv5pdmytwha7mxvu

A Framework for Low-Communication 1-D FFT

Ping Tak Peter Tang, Jongsoo Park, Daehyun Kim, Vladimir Petrov
2013 Scientific Programming  
For large-scale problems, our implementation can be twice as fast as leading FFT libraries on state-of-the-art computer clusters.  ...  In this paper, we present a mathematical framework from which many single-all-to-all and easy-to-implement 1-D FFT algorithms can be derived.  ...  Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions.  ... 
doi:10.1155/2013/672424 fatcat:2ogumqwysjeurfbkj5qreseeya

Cache-efficient numerical algorithms using graphics hardware

Naga K. Govindaraju, Dinesh Manocha
2007 Parallel Computing  
We use this approach to improve the performance of GPU-based sorting, fast Fourier transform and dense matrix multiplication algorithms.  ...  Our approach achieves high memory performance on GPUs by tiling the computation and thereby improving the cache-efficiency.  ...  This work is supported in part by ARO Contracts DAAD19-02-1-0390 and W911NF-04- We would also like to thank Whitney Vaughan and other members of UNC GAMMA group for useful suggestions and support.  ... 
doi:10.1016/j.parco.2007.09.006 fatcat:q4bhwxqgczau3d633elml5iywy

A survey of out-of-core algorithms in numerical linear algebra [chapter]

Sivan Toledo
1999 External Memory Algorithms  
The survey covers out-of-core algorithms for solving dense systems of linear equations, for the direct and iterative solution of sparse systems, for computing eigenvalues, for fast Fourier transforms,  ...  and for N-body computations.  ...  This research was performed at the Xerox Palo Alto Research Center and supported in part by DARPA contract number DABT63-95-C-0087 and by NSF contract number ASC-96-26298.  ... 
doi:10.1090/dimacs/050/09 dblp:conf/dimacs/Toledo98 fatcat:6igwt7kzubemrhlih6hbpehrtm

Dimension-independent Sparse Fourier Transform [article]

Michael Kapralov, Ameya Velingker, Amir Zandieh
2019 arXiv   pre-print
The Discrete Fourier Transform (DFT) is a fundamental computational primitive, and the fastest known algorithm for computing the DFT is the FFT (Fast Fourier Transform) algorithm.  ...  The state of the art for Sparse FFT, i.e. the problem of computing the DFT of a signal that has at most k nonzeros in Fourier domain, is very different: all current techniques for sublinear time computation  ...  Acknowledgements Michael Kapralov is supported in part by ERC Starting Grant 759471.  ... 
arXiv:1902.10633v1 fatcat:ibypgh5obzd4xajbycx3c43rkq

Snow White Clouds and the Seven Dwarfs

Stephen C. Phillips, Vegard Engen, Juri Papay
2011 2011 IEEE Third International Conference on Cloud Computing Technology and Science  
This is reflected well by Dwarf benchmarks and we show how different applications correlate more strongly with different Dwarfs, leading to the possibility of using Dwarf benchmark scores as parameters  ...  We show that different hardware is better suited for different types of computations and, thus, the relative performance of applications varies across hardware.  ...  Spectral Stockham FFT Cooley Tukey FFT Four Step FFT 1048576 1048576 1048576 2073 3727 1978 Particles 2D N-Body Cutoff 2D N-Body Barnes Hut 3D N-Body Cutoff 3D N-Body Barnes Hut 2000  ... 
doi:10.1109/cloudcom.2011.114 dblp:conf/cloudcom/PhillipsEP11 fatcat:tpnfnueun5aczi3mylzilrv3gu

Data Dwarfs: A Lens Towards Fully Understanding Big Data and AI Workloads [article]

Wanling Gao, Jianfeng Zhan, Lei Wang, Chunjie Luo, Daoyi Zheng, Fei Tang, Biwei Xie, Chen Zheng, Qiang Yang
2018 arXiv   pre-print
For the first time, among a wide variety of big data and AI workloads, we identify eight data dwarfs that takes up most of run time, including Matrix, Sampling, Logic, Transform, Set, Graph, Sort and Statistic  ...  We consider each big data and AI workload as a pipeline of one or more classes of unit of computations performed on different initial or intermediate data inputs.  ...  traversal and FFT transformation.  ... 
arXiv:1802.00699v2 fatcat:wkmaavozl5firpbpk5m4ubmunq

A Memory Model for Scientific Algorithms on Graphics Processors

Naga Govindaraju, Scott Larsen, Jim Gray, Dinesh Manocha
2006 ACM/IEEE SC 2006 Conference (SC'06)  
In order to demonstrate the effectiveness of our model, we highlight its performance on three memory-intensive scientific applications -sorting, fast Fourier transform and dense matrix-multiplication.  ...  Our memory model is based on texturing hardware, which uses a 2D block-based array representation to perform the underlying computations.  ...  Acknowledgements This work is supported in part by ARO Contracts DAAD19-02-1-0390 and W911NF-04-1-0088, NSF awards 0400134 and 0118743, DARPA/RDECOM Contract N61339-04-C-0043, ONR Contract N00014-01-1-  ... 
doi:10.1109/sc.2006.2 fatcat:khuppw4ib5bltfweweg6dhx22i

Spiral in scala

Georg Ofenbeck, Tiark Rompf, Alen Stojanov, Martin Odersky, Markus Püschel
2013 Proceedings of the 12th international conference on Generative programming: concepts & experiences - GPCE '13  
Spiral is a library generator for linear transforms such as the discrete Fourier transform (DFT). The version we consider here gener-  ...  The first main contribution of this paper is the realization of c) using type classes to abstract over staging decisions, i.e. which pieces of a computation are performed immediately and for which pieces  ...  Fast Fourier transforms (FFTs).  ... 
doi:10.1145/2517208.2517228 dblp:conf/gpce/OfenbeckRSOP13 fatcat:buvxb5h6qjdfbfmjtenq2luxgq

MachSuite: Benchmarks for accelerator design and customized architectures

Brandon Reagen, Robert Adolf, Yakun Sophia Shao, Gu-Yeon Wei, David Brooks
2014 2014 IEEE International Symposium on Workload Characterization (IISWC)  
We illustrate these aspects by characterizing each benchmark along five different dimensions, highlighting trends and salient features.  ...  The canonical Cooley-Tukey "butterfly" method we use is characterized by a wide range of strided access patterns and nested, triangular loop structures.  ...  Sparse matrices often appear when solving systems of dependent equations or computing properties on high-diameter graphs.  ... 
doi:10.1109/iiswc.2014.6983050 dblp:conf/iiswc/ReagenASWB14 fatcat:oijgahsvavczzchn7mp4jaj23y
« Previous Showing results 1 — 15 out of 50 results