A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
A High Performance Implementation of Non-Power-of-Two FFT with EPUMA Platform
2012
Procedia Engineering
In this paper, three non-power-of-two points (1152, 1200, 1536) FFT are implemented with a parallel computing platform, ePUMA. ...
The proposed implementation method can be applied to other points FFT or convolution algorithms. ...
FFT algorithms and the implementation details will be introduced in section 3 and the performance will be given and evaluated in Section 4. A conclusion will be drawn in section 5. ...
doi:10.1016/j.proeng.2012.01.503
fatcat:frpkumpl4rfq5nhnj2pqdgpude
A high-speed low-complexity modified radix-25 FFT processor for gigabit WPAN applications
2011
2011 IEEE International Symposium of Circuits and Systems (ISCAS)
The proposed FFT processor has been designed and implemented with 90nm CMOS technology in a supply voltage of 1.2V. ...
In this paper, we present a novel modified radix-2 5 algorithm for 512-point fast Fourier transform (FFT) computation and high-speed eight-parallel data-path architecture for multi-gigabit wireless personal ...
using the modified radix-2 5 algorithm • high SQNR performance by using multidata scaling A block diagram of the proposed FFT processor is shown in Fig. 1 . ...
doi:10.1109/iscas.2011.5937799
dblp:conf/iscas/ChoLPP11
fatcat:rlwtfjrmhvgc5njs3rssil7cjq
Transpose-free variable-size FFT accelerator based on-chip SRAM
2014
IEICE Electronics Express
Several parallel schemes are utilized to calculate a batch of smallsize FFT algorithms to achieve high performance and throughput. ...
For middle-and large-size of FFT, we propose a transpose-free Cooley-Tukey scheme that uses the random access feature of on-chip SRAM memory to avoid the DDR access of matrix with column-wise and improves ...
This module can independently implement a small-size FFT.
.1 Parallel schemes for a batch of small-size FFTs A batch of small-size FFT algorithms is evaluated by several parallel computation schemes ...
doi:10.1587/elex.11.20140171
fatcat:h6ygih5gxnb73cd4p5iczwpidm
A high performance parallel algorithm for 1-D FFT
1994
Supercomputing, Proceedings
In this paper we propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. ...
We implemented this kernel on the IBM SP1 and observed a p erformance of 1.25 GFLOPS on a 64-node machine. ...
In this paper, we propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. ...
doi:10.1145/602783.602784
fatcat:7kcgwpmmpveb5je63vwyhk34ne
A high performance parallel algorithm for 1-D FFT
1994
Supercomputing, Proceedings
In this paper we propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. ...
We implemented this kernel on the IBM SP1 and observed a p erformance of 1.25 GFLOPS on a 64-node machine. ...
In this paper, we propose a parallel high performance FFT algorithm based on a multi-dimensional formulation. ...
doi:10.1145/602770.602784
fatcat:cps3jihrxnbefe7vf3fp4r523m
NUMA-aware FFT-based Convolution on ARMv8 Many-core CPUs
[article]
2021
arXiv
pre-print
The FFT-based algorithm can improve the efficiency of convolution by reducing its algorithm complexity, there are a lot of works about the high-performance implementation of FFT-based convolution on many-core ...
The implementation can reduce a number of remote memory access through the data reordering of FFT transformations and the three-level parallelization of the complex matrix multiplication. ...
Thus, there is a lot of work about studying high-performance implementation of FFT-based convolution algorithm on different platforms. Mathieu and Vasilache et al. ...
arXiv:2109.12259v1
fatcat:egp7ugkv4fedzjvqfuxcuf4244
Review of Parallel Polynomial Multiplier based on FFT using Indian Vedic Mathematics
2015
International Journal of Computer Applications
In general, most of the operations performed by any complex system need a multiplier. Hence, multiplier based on FFT is the desired aim. ...
In this paper, we have presented a review of parallel polynomial multiplier based on FFT using Indian Vedic mathematics. ...
Different FFT algorithms, like the Radix-4 and the Split-Radix FFT algorithm, used to reduce the number of computations [3] . ...
doi:10.5120/19756-1379
fatcat:yhrgzw7qffhnjmjqqpr5kmqyye
FPGA based Reconfigurable 2D FFT System
2011
Al-Rafidain Engineering Journal
The system employs two one Dimensional (1D) FFT processor each with sixteen reconfigurable parallel FFT cores. Each core represents a 16 complex point parallel FFT engine. ...
The adopted approach considers both the hardware cost (in terms of FPGA resource requirements), and performance (in terms of throughput). ...
Despite of that, high performance, large-scale DSP algorithms still cannot fit in a single FPGA and require carful design considerations. ...
doi:10.33899/rengj.2011.27026
fatcat:qjfm35y4xvbw3clypyk2hluxoy
CROFT: A scalable three-dimensional parallel Fast Fourier Transform (FFT) implementation for High Performance Clusters
[article]
2020
arXiv
pre-print
The FFT of three-dimensional (3D) input data is an important computational kernel of numerical simulations and is widely used in High Performance Computing (HPC) codes running on a large number of processors ...
Performance of many scientific applications such as Molecular Dynamic simulations depends on the underlying 3D parallel FFT library being used. ...
This is achieved using step 2, 3 and 4 of the algorithm. ...
arXiv:2002.04896v2
fatcat:nc4dope5ibcdjj6qnvgentdhwu
Benchmarking Parallel Three Dimensional FFT Kernels with ZENTURIO
[chapter]
2004
Lecture Notes in Computer Science
In this paper we describe a systematic methodology for benchmarking parallel application kernels using the ZENTURIO experiment management tool. ...
Performance of parallel scientific applications is often heavily influenced by various mathematical kernels like linear algebra software that needs to be highly optimised for each particular platform. ...
A comparative analysis of the two FFT parallel algorithms shows, as expected, a better performance of wpp3DFFT compared to FFTW for large problem sizes, which is due to the highly optimised wpp3DFFT transpose ...
doi:10.1007/978-3-540-24687-9_58
fatcat:mxs33hrlqzhwvk2bceimntpioe
An Input-Adaptive Algorithm for High Performance Sparse Fast Fourier Transform
[chapter]
2014
Lecture Notes in Computer Science
In particular, our performance is compared to that of the SSE-enabled FFTW and to the results of a highly-influential recently proposed sparse Fourier algorithm. ...
Compared with the "dense" FFT algorithms, the input sparsity makes it easier to parallelize the sparse counterparts. ...
Fig.6 shows the parallel versions of our sparse FFT on three high performance GPUs. ...
doi:10.1007/978-3-319-09967-5_15
fatcat:tmjyxko23jeojmjglippi7waha
PFFT: An Extension of FFTW to Massively Parallel Architectures
2013
SIAM Journal on Scientific Computing
For example, we provide performance measurements of FFTs of size 512 3 and 1024 3 up to 262144 cores on a BlueGene/P architecture. ...
Similar to established transpose FFT algorithms, we propose a parallel FFT framework that is based on a combination of local FFTs, local data permutations and global data transpositions. ...
Furthermore, we gratefully acknowledge the help of Ralf Wildenhues and Michael Hofmann on the PFFT build system. ...
doi:10.1137/120885887
fatcat:34j6qj75ibf3vc2mippksmfpea
FFT for the APE Parallel Computer
1997
International Journal of Modern Physics C
We present a parallel FFT algorithm for SIMD systems following the 'Transpose Algorithm' approach. The method is based on the assignment of the data field onto a 1-dimensional ring of systolic cells. ...
We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a 2-dimensional hydrodynamics code for turbulence studies. ...
The numerical tests in this work have been carried out on a 512-node QH4 system at INFN, Pisa, Italy and the 256-node QH2 and 128-node QH1 at the University of Bielefeld, Germany. ...
doi:10.1142/s012918319700117x
fatcat:yu4gsn7vpndvvcstyn3t3jij5i
Overview of Parallel Platforms for Common High Performance Computing
2012
Radioengineering
The paper deals with various parallel platforms used for high performance computing in the signal processing domain. ...
New FFT and DCT implementations were proposed and tested. ...
Research published in this paper was also financially supported by the project CZ.1.07/2.3.00/20.0007 WICOMT of the operational program Education for compet- ...
doaj:00c6ffebfc054375bdd1bb8dbfd13e7f
fatcat:x265jgp7ibejhomntupv3lj6nm
Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs
[article]
2018
arXiv
pre-print
In this work, we explore different fast convolution algorithms including Winograd and Fast Fourier Transform (FFT), and find an optimal strategy to apply them together on different types of convolutions ...
We implement a configurable IP-based face recognition acceleration system based on FaceNet using High-Level Synthesis. ...
ACKNOWLEDGMENTS This work is supported by IBM-Illinois Center for Cognitive Computing Systems Research (C 3 SR), a research collaboration as part of the IBM AI Horizons Network. ...
arXiv:1803.09004v1
fatcat:5vs3hyqbl5ec7acd4yfh3abic4
« Previous
Showing results 1 — 15 out of 29,327 results