Filters








29,327 Hits in 7.1 sec

A High Performance Implementation of Non-Power-of-Two FFT with EPUMA Platform

Zhenyu Liu, Qunfang Xie, Hongkai Wang, Yanjun Zhang, Dake Liu
2012 Procedia Engineering  
In this paper, three non-power-of-two points (1152, 1200, 1536) FFT are implemented with a parallel computing platform, ePUMA.  ...  The proposed implementation method can be applied to other points FFT or convolution algorithms.  ...  FFT algorithms and the implementation details will be introduced in section 3 and the performance will be given and evaluated in Section 4. A conclusion will be drawn in section 5.  ... 
doi:10.1016/j.proeng.2012.01.503 fatcat:frpkumpl4rfq5nhnj2pqdgpude

A high-speed low-complexity modified radix-25 FFT processor for gigabit WPAN applications

Taesang Cho, Hanho Lee, Jounsup Park, Chulgyun Park
2011 2011 IEEE International Symposium of Circuits and Systems (ISCAS)  
The proposed FFT processor has been designed and implemented with 90nm CMOS technology in a supply voltage of 1.2V.  ...  In this paper, we present a novel modified radix-2 5 algorithm for 512-point fast Fourier transform (FFT) computation and high-speed eight-parallel data-path architecture for multi-gigabit wireless personal  ...  using the modified radix-2 5 algorithmhigh SQNR performance by using multidata scaling A block diagram of the proposed FFT processor is shown in Fig. 1 .  ... 
doi:10.1109/iscas.2011.5937799 dblp:conf/iscas/ChoLPP11 fatcat:rlwtfjrmhvgc5njs3rssil7cjq

Transpose-free variable-size FFT accelerator based on-chip SRAM

Lei Guo, Yuhua Tang, Yuanwu Lei, Yong Dou, Jie Zhou
2014 IEICE Electronics Express  
Several parallel schemes are utilized to calculate a batch of smallsize FFT algorithms to achieve high performance and throughput.  ...  For middle-and large-size of FFT, we propose a transpose-free Cooley-Tukey scheme that uses the random access feature of on-chip SRAM memory to avoid the DDR access of matrix with column-wise and improves  ...  This module can independently implement a small-size FFT. .1 Parallel schemes for a batch of small-size FFTs A batch of small-size FFT algorithms is evaluated by several parallel computation schemes  ... 
doi:10.1587/elex.11.20140171 fatcat:h6ygih5gxnb73cd4p5iczwpidm

A high performance parallel algorithm for 1-D FFT

R. C. Agarwal, F. G. Gustavson, M. Zubair
1994 Supercomputing, Proceedings  
In this paper we propose a parallel high performance FFT algorithm based on a multi-dimensional formulation.  ...  We implemented this kernel on the IBM SP1 and observed a p erformance of 1.25 GFLOPS on a 64-node machine.  ...  In this paper, we propose a parallel high performance FFT algorithm based on a multi-dimensional formulation.  ... 
doi:10.1145/602783.602784 fatcat:7kcgwpmmpveb5je63vwyhk34ne

A high performance parallel algorithm for 1-D FFT

R. C. Agarwal, F. G. Gustavson, M. Zubair
1994 Supercomputing, Proceedings  
In this paper we propose a parallel high performance FFT algorithm based on a multi-dimensional formulation.  ...  We implemented this kernel on the IBM SP1 and observed a p erformance of 1.25 GFLOPS on a 64-node machine.  ...  In this paper, we propose a parallel high performance FFT algorithm based on a multi-dimensional formulation.  ... 
doi:10.1145/602770.602784 fatcat:cps3jihrxnbefe7vf3fp4r523m

NUMA-aware FFT-based Convolution on ARMv8 Many-core CPUs [article]

Xiandong Huang, Qinglin Wang, Shuyu Lu, Ruochen Hao, Songzhu Mei, Jie Liu
2021 arXiv   pre-print
The FFT-based algorithm can improve the efficiency of convolution by reducing its algorithm complexity, there are a lot of works about the high-performance implementation of FFT-based convolution on many-core  ...  The implementation can reduce a number of remote memory access through the data reordering of FFT transformations and the three-level parallelization of the complex matrix multiplication.  ...  Thus, there is a lot of work about studying high-performance implementation of FFT-based convolution algorithm on different platforms. Mathieu and Vasilache et al.  ... 
arXiv:2109.12259v1 fatcat:egp7ugkv4fedzjvqfuxcuf4244

Review of Parallel Polynomial Multiplier based on FFT using Indian Vedic Mathematics

Shilpa Jumde, R. N. Mandavgane, D. M. Khatri
2015 International Journal of Computer Applications  
In general, most of the operations performed by any complex system need a multiplier. Hence, multiplier based on FFT is the desired aim.  ...  In this paper, we have presented a review of parallel polynomial multiplier based on FFT using Indian Vedic mathematics.  ...  Different FFT algorithms, like the Radix-4 and the Split-Radix FFT algorithm, used to reduce the number of computations [3] .  ... 
doi:10.5120/19756-1379 fatcat:yhrgzw7qffhnjmjqqpr5kmqyye

FPGA based Reconfigurable 2D FFT System

Shefa A. Dawwd, Ahmad F. Al-allaf
2011 Al-Rafidain Engineering Journal  
The system employs two one Dimensional (1D) FFT processor each with sixteen reconfigurable parallel FFT cores. Each core represents a 16 complex point parallel FFT engine.  ...  The adopted approach considers both the hardware cost (in terms of FPGA resource requirements), and performance (in terms of throughput).  ...  Despite of that, high performance, large-scale DSP algorithms still cannot fit in a single FPGA and require carful design considerations.  ... 
doi:10.33899/rengj.2011.27026 fatcat:qjfm35y4xvbw3clypyk2hluxoy

CROFT: A scalable three-dimensional parallel Fast Fourier Transform (FFT) implementation for High Performance Clusters [article]

Vivek Gavane, Supriya Prabhugawankar, Shivam Garg, Archana Achalere, Rajendra Joshi
2020 arXiv   pre-print
The FFT of three-dimensional (3D) input data is an important computational kernel of numerical simulations and is widely used in High Performance Computing (HPC) codes running on a large number of processors  ...  Performance of many scientific applications such as Molecular Dynamic simulations depends on the underlying 3D parallel FFT library being used.  ...  This is achieved using step 2, 3 and 4 of the algorithm.  ... 
arXiv:2002.04896v2 fatcat:nc4dope5ibcdjj6qnvgentdhwu

Benchmarking Parallel Three Dimensional FFT Kernels with ZENTURIO [chapter]

Radu Prodan, Andreas Bonelli, Andreas Adelmann, Thomas Fahringer, Christoph Überhuber
2004 Lecture Notes in Computer Science  
In this paper we describe a systematic methodology for benchmarking parallel application kernels using the ZENTURIO experiment management tool.  ...  Performance of parallel scientific applications is often heavily influenced by various mathematical kernels like linear algebra software that needs to be highly optimised for each particular platform.  ...  A comparative analysis of the two FFT parallel algorithms shows, as expected, a better performance of wpp3DFFT compared to FFTW for large problem sizes, which is due to the highly optimised wpp3DFFT transpose  ... 
doi:10.1007/978-3-540-24687-9_58 fatcat:mxs33hrlqzhwvk2bceimntpioe

An Input-Adaptive Algorithm for High Performance Sparse Fast Fourier Transform [chapter]

Shuo Chen, Xiaoming Li
2014 Lecture Notes in Computer Science  
In particular, our performance is compared to that of the SSE-enabled FFTW and to the results of a highly-influential recently proposed sparse Fourier algorithm.  ...  Compared with the "dense" FFT algorithms, the input sparsity makes it easier to parallelize the sparse counterparts.  ...  Fig.6 shows the parallel versions of our sparse FFT on three high performance GPUs.  ... 
doi:10.1007/978-3-319-09967-5_15 fatcat:tmjyxko23jeojmjglippi7waha

PFFT: An Extension of FFTW to Massively Parallel Architectures

Michael Pippig
2013 SIAM Journal on Scientific Computing  
For example, we provide performance measurements of FFTs of size 512 3 and 1024 3 up to 262144 cores on a BlueGene/P architecture.  ...  Similar to established transpose FFT algorithms, we propose a parallel FFT framework that is based on a combination of local FFTs, local data permutations and global data transpositions.  ...  Furthermore, we gratefully acknowledge the help of Ralf Wildenhues and Michael Hofmann on the PFFT build system.  ... 
doi:10.1137/120885887 fatcat:34j6qj75ibf3vc2mippksmfpea

FFT for the APE Parallel Computer

Thomas Lippert, Klaus Schilling, Sven Trentmann, Federico Toschi, Raffaele Tripiccione
1997 International Journal of Modern Physics C  
We present a parallel FFT algorithm for SIMD systems following the 'Transpose Algorithm' approach. The method is based on the assignment of the data field onto a 1-dimensional ring of systolic cells.  ...  We have realized a scalable parallel FFT on the APE100/Quadrics massively parallel computer, where our implementation is part of a 2-dimensional hydrodynamics code for turbulence studies.  ...  The numerical tests in this work have been carried out on a 512-node QH4 system at INFN, Pisa, Italy and the 256-node QH2 and 128-node QH1 at the University of Bielefeld, Germany.  ... 
doi:10.1142/s012918319700117x fatcat:yu4gsn7vpndvvcstyn3t3jij5i

Overview of Parallel Platforms for Common High Performance Computing

T. Fryza, J. Svobodova, F. Adamec, R. Marsalek, J. Prokopec
2012 Radioengineering  
The paper deals with various parallel platforms used for high performance computing in the signal processing domain.  ...  New FFT and DCT implementations were proposed and tested.  ...  Research published in this paper was also financially supported by the project CZ.1.07/2.3.00/20.0007 WICOMT of the operational program Education for compet-  ... 
doaj:00c6ffebfc054375bdd1bb8dbfd13e7f fatcat:x265jgp7ibejhomntupv3lj6nm

Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs [article]

Chuanhao Zhuge, Xinheng Liu, Xiaofan Zhang, Sudeep Gummadi, Jinjun Xiong, Deming Chen
2018 arXiv   pre-print
In this work, we explore different fast convolution algorithms including Winograd and Fast Fourier Transform (FFT), and find an optimal strategy to apply them together on different types of convolutions  ...  We implement a configurable IP-based face recognition acceleration system based on FaceNet using High-Level Synthesis.  ...  ACKNOWLEDGMENTS This work is supported by IBM-Illinois Center for Cognitive Computing Systems Research (C 3 SR), a research collaboration as part of the IBM AI Horizons Network.  ... 
arXiv:1803.09004v1 fatcat:5vs3hyqbl5ec7acd4yfh3abic4
« Previous Showing results 1 — 15 out of 29,327 results