11,171 Hits in 6.5 sec

Python Non-Uniform Fast Fourier Transform (PyNUFFT): multi-dimensional non-Cartesian image reconstruction package for heterogeneous platforms and applications to MRI [article]

Jyh-Miin Lin
2017 arXiv   pre-print
This paper reports the development of a Python Non-Uniform Fast Fourier Transform (PyNUFFT) package, which accelerates non-Cartesian image reconstruction on heterogeneous platforms.  ...  The PyNUFFT package has been tested on multi-core CPU and GPU, with acceleration factors of 6.3 - 9.5× on a 32 thread CPU platform and 5.4 - 13× on the GPU.  ...  The benchmarks were carried out on Amazon Web Services provided by AWS Educate credit. J.-M. Lin declares no conflict of interest. References  ... 
arXiv:1710.03197v1 fatcat:6z4eizuiujbznnh6nh3g6z6xzm

Python Non-Uniform Fast Fourier Transform (PyNUFFT): An Accelerated Non-Cartesian MRI Package on a Heterogeneous Platform (CPU/GPU)

Jyh-Miin Lin
2018 Journal of Imaging  
A Python non-uniform fast Fourier transform (PyNUFFT) package has been developed to accelerate multidimensional non-Cartesian image reconstruction on heterogeneous platforms.  ...  The PyNUFFT package has been tested on multi-core central processing units (CPUs) and graphic processing units (GPUs), with acceleration factors of 6.3-9.5× on a 32-thread CPU platform and 5.4-13× on a  ...  Introduction Fast Fourier transform (FFT) is an exact fast algorithm to compute the discrete Fourier transform (DFT) when data are acquired on an equispaced grid.  ... 
doi:10.3390/jimaging4030051 fatcat:a63kgcae7vgr7d5wkgecuykdwa

Energy Efficient Computation Method for CPU-GPU System on Chip

C. Sai Punitha
2018 International Journal for Research in Applied Science and Engineering Technology  
The improvement in performance gained by the use of a multi-core processor depends very much on the software algorithms used and their implementation.  ...  The increasing trends in multi-core chips allows higher performance at lower energy and the communication between the cores is a limiting factor which can be improved by the parallel computation such as  ...  Since each core in a multi-core CPU is generally more energy-efficient, the chip becomes more efficient than having a single large monolithic core.  ... 
doi:10.22214/ijraset.2018.5199 fatcat:g3wqyatcdffa3n743oazsds7wi

Large Scale Parallelized 3D Mesoscopic Simulations Of The Mechanical Response To Shear In Disordered Media

Kirsten Martens
2015 Zenodo  
In this paper we describe the development of a code that implements a coarse grained dynamics for the large scale modeleling of 3 dimensional athermal yielding and flow of disordered systems under externally  ...  The stochastic lattice model for the heterogeneous flow response involves long range elastic interactions, that are resolved using fast Fourier techniques, implemented in parallel in an efficient and well  ...  The choice to perform the convolution in Fourier space allow to profit from fast Fourier transform methods and convert the long range interactions, that would imply a large number of CPU communications  ... 
doi:10.5281/zenodo.825609 fatcat:mkzobksr35caxk7mh3aislglse

Energy-Saving Task Scheduling Based on Hard Reliability Requirements: A Novel Approach with Low Energy Consumption and High Reliability

Qingfeng Chen, Yu Han, Jing Wu, Yu Gan
2022 Sustainability  
problem of DAG applications concerning energy-saving and hard reliability requirements in heterogeneous multi-core processor systems.  ...  With the increasing complexity of application situations in multi-core processing systems, how to assure task execution reliability has become a focus of scheduling algorithm research in recent years.  ...  The Fast Fourier Transform is an efficient algorithm for computing the discrete Fourier transform in a computer.  ... 
doi:10.3390/su14116591 fatcat:jsgojo2r6vgtfbnbot22m2skvu

A simple spectral algorithm for solving large-scale Poisson equation in 2D

X Thibert-Plante, D.A Yuen, A.P Vincent
2003 Computer Physics Communications  
We have used a spectral Fourier technique and parallelized FFTs with OPEN_MP on SGI machines. This method can be easily extended to 3D.  ...  We show that it is possible with easy-to-program algorithms to reach spatial resolutions of the order of 10 8 grid points for computing the electric potential on 2D periodic lattices, such as the Si(111  ...  Xavier Thibert-Plante is a fellow of CRSNG Canada.  ... 
doi:10.1016/s0010-4655(03)00283-2 fatcat:5vcw5rpycvh7pmrqct6gb2uyqi

D12.1: Heterogeneous and Auto-tuned Runtime System

Christian Perez, Zhengxiong Hou, Judit Planas, Rosa Badia, Eduard Aygüadé, Jesus Labarta, Michael Schliephake, Chandan Basu, Johan Raber, Massimo Guarrasi, Lasse Natvig, Kostis Nikas (+5 others)
2013 Zenodo  
Task 12.1 contributes to improve the support of auto-tuning methods to face the complexity of existing and future large scale systems.  ...  It impacts parallel languages, runtime, generic and kernel specific auto-tuning algorithms, multi-core, many-core and multi-node sytems, as well as batch systems and energy consumption measurement methods  ...  This complexity is particular high within a node, with the apparition of large multi-core and/or many-core systems, leading to deep memory hiearchies and heterogeneous nodes.  ... 
doi:10.5281/zenodo.6572371 fatcat:uttgomgovjeb5iopc2ccgyar7y

Improving HPC Application Performance in Cloud through Dynamic Load Balancing

A. Gupta, O. Sarood, L. V. Kale, D. Milojicic
2013 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing  
It infers the static hardware heterogeneity in virtualized environments, and also adapts to the dynamic heterogeneity caused by the interference arising due to multi-tenancy.  ...  Through experimental evaluation on a private cloud with 64 VMs using benchmarks and a real science application, we demonstrate performance benefits up to 45%.  ...  VM (on a Fast core) which starts at iteration 50.  ... 
doi:10.1109/ccgrid.2013.65 dblp:conf/ccgrid/GuptaSKM13 fatcat:poatj7saofdvdpijockg4ehmky

High Performance Graph Data Imputation on Multiple GPUs

Chao Zhou, Tao Zhang
2021 Future Internet  
Furthermore, we design a scheme to extend the GPU-optimized implementation to multiple GPUs for large-scale computing. Experimental results show that the GPU implementation is both fast and accurate.  ...  In this paper, we propose a scheme to perform the convolutional imputation algorithm with higher time performance on GPUs (Graphics Processing Units) by exploiting multi-core GPUs of CUDA architecture.  ...  In the field of data mining, a variety of graph processing systems have been developed, from GraphChi [6] , which is a CPU-based system for computing large-scale graphs on a single machine, to the multi-GPU  ... 
doi:10.3390/fi13020036 fatcat:qv5vzegj3jcm3elsk5ssjsxedu

Impact of Kernel-Assisted MPI Communication over Scientific Applications: CPMD and FFTW [chapter]

Teng Ma, Aurelien Bouteiller, George Bosilca, Jack J. Dongarra
2011 Lecture Notes in Computer Science  
FFTW, a Discrete Fourier Transform (DFT).  ...  Our experiments indicate that the quality of the collective communication implementation on a specific machine plays a critical role on the overall application performance.  ...  FFTW, "Fastest Fourier Transform in the West", is one of the most popular libraries to compute discrete Fourier transforms (DFTs).  ... 
doi:10.1007/978-3-642-24449-0_28 fatcat:bnkvbrne4jebram52epyspqhre

heFFTe: Highly Efficient FFT for Exascale [chapter]

Alan Ayala, Stanimire Tomov, Azzam Haidar, Jack Dongarra
2020 Lecture Notes in Computer Science  
Currently, several and diverse applications, such as those part of the Exascale Computing Project (ECP) in the United States, rely on efficient computation of the Fast Fourier Transform (FFT).  ...  A communication model for parallel FFTs is also provided to analyze the bottleneck for large-scale problems.  ...  Introduction Considered one of the top 10 algorithms of the 20th century, the Fast Fourier transform (FFT) is widely used by applications in science and engineering.  ... 
doi:10.1007/978-3-030-50371-0_19 fatcat:hblqdlwkvjchpckx7npiqgtlyi

Accelerating Fast Fourier Transforms Using Hadoop and CUDA [article]

Rostislav Tsiomenko, Bradley S. Rees
2014 arXiv   pre-print
There has been considerable research into improving Fast Fourier Transform (FFT) performance through parallelization and optimization for specialized hardware.  ...  In this paper we present a unique approach that not only parallelizes the workload over multi-cores, but distributes the problem over a cluster of graphics processing unit (GPU)-equipped servers.  ...  We plan on expanding our work to allow overlapping FFTs operation to be performed in our distributed environment. VII.  ... 
arXiv:1407.6915v1 fatcat:a4xzuas3s5dp7h2bvtlv7zk53y

Introducing Scalable Quantum Approaches in Language Representation [chapter]

Peter Wittek, Sándor Darányi
2011 Lecture Notes in Computer Science  
The novel paradigm of general-purpose computing on graphics processors (GPGPU) offers a feasible and economical alternative: it has already become a common phenomenon in scientific computation, with many  ...  High-performance computational resources and distributed systems are crucial for the success of real-world language technology applications.  ...  Fast Fourier transformation on GPUs is a classical area for acceleration [57] .  ... 
doi:10.1007/978-3-642-24971-6_2 fatcat:emliiuolnzdtpnhfflc7wsmkde

2018 Index IEEE Transactions on Computers Vol. 67

2019 IEEE transactions on computers  
., TC June 2018 771-783 Faz-Hernandez, A., Lopez, J., Ochoa-Jimenez, E., and Rodriguez-Henriquez, F., A Faster Software Implementation of the Supersingular Isogeny Diffie-Hellman Key Exchange Protocol  ...  Feng, H., þ, TC Feb. 2018 252-267 F Fast Fourier transforms A Scheme to Design Concurrent Error Detection Techniques for the Fast Fourier Transform Implemented in SRAM-Based FPGAs.  ...  Choi, I., þ, TC Dec. 2018 1835-1839 Decision diagrams Performability Analysis of Large-Scale Multi-State Computing Systems.  ... 
doi:10.1109/tc.2018.2882120 fatcat:j2j7yw42hnghjoik2ghvqab6ti

Accurate, scalable and informative design space exploration for large and sophisticated multi-core oriented architectures

Chang-Burm Cho, J. Poe, Tao Li, Jingling Yuan
2009 2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems  
We extensively evaluate the efficiency of our predictive models in forecasting the complex and heterogeneous characteristics of large and distributed shared cache interconnected by a network on chip in  ...  In this paper, we propose novel, multi-scale 2D predictive models which can efficiently reason the characteristics of large and sophisticated multi-core oriented architectures during the design space exploration  ...  ACKNOWLEDGMENT This work is supported in part by NSF CAREER Award CCF-0845721, and by Microsoft Research Safe and Scalable Multi-core Computing Award.  ... 
doi:10.1109/mascot.2009.5366283 dblp:conf/mascots/ChoPLY09 fatcat:npsknlayezc65h6w4plxnjtcqu
« Previous Showing results 1 — 15 out of 11,171 results