233 Hits in 5.0 sec

Auto-Tuning Of The Fftw Library For Massively Parallel Supercomputers

Massimiliano Guarrasi
2013 Zenodo  
In particular, we have compared the performance of the standard Slab Decomposition algorithm already present with that obtained using the 2D Domain Decomposition and we found that on massively parallel  ...  supercomputers the performance of this new algorithm is significantly higher.  ...  Autotuning of FFTW Library for Massively Parallel Supercomputers  ... 
doi:10.5281/zenodo.807089 fatcat:722hwjgz7be6rnkg3k2rgp7r4e

D12.1: Heterogeneous and Auto-tuned Runtime System

Christian Perez, Zhengxiong Hou, Judit Planas, Rosa Badia, Eduard Aygüadé, Jesus Labarta, Michael Schliephake, Chandan Basu, Johan Raber, Massimo Guarrasi, Lasse Natvig, Kostis Nikas (+5 others)
2013 Zenodo  
Furthermore, as it is widely accepted that the key to exploiting future high-end systems will be based on research on new numerical algorithms as well as advancing the parallel processing technology used  ...  The work in WP12 focuses on auto tuned and automatic techniques to be applied in parallel programming model runtimes (Task 12.1: "Auto-tuned runtime Environments"), performance tools (Task 12.3: "Development  ...  Thus, the use of these algorithms can highly improve the performance of the FFTW library on modern massively parallel supercomputers.  ... 
doi:10.5281/zenodo.6572371 fatcat:uttgomgovjeb5iopc2ccgyar7y

PFFT: An Extension of FFTW to Massively Parallel Architectures

Michael Pippig
2013 SIAM Journal on Scientific Computing  
We present a MPI based software library for computing the fast Fourier transforms on massively parallel, distributed memory architectures.  ...  This framework can be generalized to arbitrary multi-dimensional data and process meshes. All performance relevant building blocks can be implemented with the help of the FFTW software library.  ...  We wish to thank Sebastian Banert, who did some of the runtime measurements on JuRoPA and Jugene.  ... 
doi:10.1137/120885887 fatcat:34j6qj75ibf3vc2mippksmfpea

Scalability Improvements For Dft Codes Due To The Implementation Of The 2D Domain Decomposition Algorithm

Massimiliano Guarrasi
2013 Zenodo  
The performance of this new algorithm are tested on two example applications: Quantum Espresso, a popular code used in materials science, and , the CFD code BlowupNS.  ...  algorithm in DFT codes that use standard 1D (or slab) Parallel Domain Decomposition.  ...  Acknowledgements This work was financially supported by the PRACE-2IP project [15]  ... 
doi:10.5281/zenodo.831978 fatcat:qszuzfefxfdyxir7ocyfuo5ini

MEGADOCK 3.0: a high-performance protein-protein interaction prediction software using hybrid parallel computing for petascale supercomputing environments

Yuri Matsuzaki, Nobuyuki Uchikoga, Masahito Ohue, Takehiro Shimoda, Toshiyuki Sato, Takashi Ishida, Yutaka Akiyama
2013 Source Code for Biology and Medicine  
Massively parallel supercomputing systems have been actively developed over the past few years, which enable large-scale biological problems to be solved, such as PPI network prediction based on tertiary  ...  Results: We have developed a high throughput and ultra-fast PPI prediction system based on rigid docking, "MEGADOCK", by employing a hybrid parallelization (MPI/OpenMP) technique assuming usages on massively  ...  The technical assistance of Hikaru Inoue and Tomoyuki Noda in Fujitsu Co. Ltd. on using K computer is greatly acknowledged.  ... 
doi:10.1186/1751-0473-8-18 pmid:24004986 pmcid:PMC3847482 fatcat:g3famucp45e2vpuchsreynf46a

A generalized massively parallel ultra-high order FFT-based Maxwell solver [article]

Haithem Kallala, Jean-Luc Vay, Henri Vincenti
2018 arXiv   pre-print
This 'hybrid' technique was implemented in the open source exascale library PICSAR.  ...  A dual domain decomposition method is used for the Maxwell solver and other parts of the PIC cycle to keep the simulation load-balanced.  ...  Acknowledgements The authors would like to thank Rémi Lehe and Julien Derouillat for fruitfull discussions.  ... 
arXiv:1812.07357v1 fatcat:hwv4t7cuozcc3l5g4yxbp36eue

Ultrahigh-order Maxwell solver with extreme scalability for electromagnetic PIC simulations of plasmas

Henri Vincenti, Jean-Luc Vay
2018 Computer Physics Communications  
The advent of massively parallel supercomputers, with their distributed-memory technology using many processing units, has favored the development of highly-scalable local low-order solvers at the expense  ...  We demonstrate here that a new method, based on the use of local FFTs, enables ultrahigh-order accuracy with unprecedented scalability, and thus for the first time the accurate modeling of plasma mirrors  ...  Irving Haber and Dr. Fabien Quere for fruitful discussions. We are also very grateful to Dr. A. Leblanc who provided us with the experimental results performed on the UHI100 laser at CEA Saclay.  ... 
doi:10.1016/j.cpc.2018.03.018 fatcat:7paa4v4sdjdczpngs2wti6bmni

Introducing ZEUS-MP: A 3D, Parallel, Multiphysics Code for Astrophysical Fluid Dynamics [article]

Michael L. Norman
2000 arXiv   pre-print
Parallelization is done by domain decomposition and implemented in F77 and MPI. The code is portable across a wide range of platforms from networks of workstations to massively parallel processors.  ...  ZEUS-MP is a follow-on to the sequential ZEUS-2D and ZEUS-3D codes developed and disseminated by the Laboratory for Computational Astrophysics ( at NCSA.  ...  The former utilizes the FFTw library developed at MIT, while the latter uses MGMPI.  ... 
arXiv:astro-ph/0005109v1 fatcat:fco6uups2najhhhklajkzbekgq

D7.5: HPC Programming Techniques

Cevdet Aykanat, Antun Balaz, Iris Christadler, Ivan Girotto, Jose Gracia, Vladimir Slavnic, Andy Sunderland, Ata Türk
2012 Zenodo  
They ranged from the introduction of new algorithms for sparse matrix operations to the assessment of new languages like StarSs, Chapel, Cilk and ArBB; from the comparison of mathematical libraries to  ...  This task worked with users to implement new programming techniques, paradigms and algorithms for Tier-1 and Tier-0 systems, which have the potential to facilitate significant improvements in their applications  ...  The computer model is based on implementation of a new class of parallel numerical methods and algorithms for time dependent problems.  ... 
doi:10.5281/zenodo.6552939 fatcat:z2gdhmnojrh6bj2lordyl7lmnq

High performance Python for direct numerical simulations of turbulent flows

Mikael Mortensen, Hans Petter Langtangen
2016 Computer Physics Communications  
better than similar routines provided through the FFTW library.  ...  The solvers have been verified and benchmarked on the Shaheen supercomputer at the KAUST supercomputing laboratory, and we are able to show very good scaling up to several thousand cores.  ...  To establish scaling and benchmark results, we have run the codes on Shaheen, a massively parallel Blue Gene/P machine at the KAUST Supercomputing Laboratory.  ... 
doi:10.1016/j.cpc.2016.02.005 fatcat:wyrnkerqvvbnfonjr6xbhuaxpy

D7.3: Inventory of Exascale Tools and Techniques

Nicola Mc Donnell
2016 Zenodo  
Task 7.2, within WP7, 'Preparing for Future PRACE Exascale Systems' aims to investigate the various programming tools, languages, libraries and algorithms needed for future Exascale systems through an  ...  In Section 2, we summarise our findings separately by topic: programming interfaces and standards, debuggers and profilers, scalable libraries and algorithms and I/O management techniques, European Exascale  ...  Acknowledgements The authors would like to acknowledge and thank the Centres of Excellence for their cooperation with and contribution to this deliverable.  ... 
doi:10.5281/zenodo.6801725 fatcat:ez63t2znsvdcpnvijzi4c74dc4


Steve Petruzza, Aniketh Venkat, Attila Gyulassy, Giorgio Scorzelli, Frederick Federer, Alessandra Angelucci, Valerio Pascucci, Peer-Timo Bremer
2017 SIGGRAPH Asia 2017 Symposium on Visualization on - SA '17  
Second, a library simplifies the process of developing new analytics algorithms, allowing users to rapidly prototype new approaches and deploy them in an HPC setting.  ...  Furthermore analysis on HPC systems often require complex hand-written parallel implementations of algorithms that suffer from poor portability and maintainability.  ...  Furthermore, this component provides flexibility to easily implement an algorithm, test it interactively on local resources using the data streaming infrastructure (i.e.  ... 
doi:10.1145/3139295.3139299 pmid:30148289 pmcid:PMC6105268 dblp:conf/siggraph/PetruzzaVGSFAPB17 fatcat:ppcwkpk32jd65arcaw3nrnvzne

Code Optimization And Scaling Of The Astrophysics Software Gadget On Intel Xeon Phi

P. Borovska
2014 Zenodo  
As a result, the hybrid MPI/OpenMP parallelization of the code has been enabled and scalability tests on the Intel Xeon Phi processors, on the PRACE EURORA system are reported.  ...  The whitepaper reports our investigation into the porting, optimization and subsequent performance of the astrophysics software package GADGET, on the Intel Xeon Phi.  ...  The project was realized using the EURORA System at CINECA, Italy.  ... 
doi:10.5281/zenodo.822647 fatcat:syggohdb3jgfjp4shttdzsvoiq

D12.2: Exploration of Scalable Numerical Algorithms

Cevdet Aykanat, Ata Turk
2012 Zenodo  
Furthermore, as it is widely accepted that the key to exploiting future high-end systems will be based on research on new numerical algorithms as well as advancing the parallel processing technology used  ...  in Heterogeneous Architectures Application Scalability Task 12.2 evaluated different algorithms, methods, and approaches and demonstrated the scalability of the algorithms using simple ad-hoc programs  ...  New developed and tuned algorithms and codes for massively parallel platform like IBM BlueGene/P computer are integrated and tested.  ... 
doi:10.5281/zenodo.6572347 fatcat:lbjrp3hjwnb2hkqx7znvuuqe4y

Benchmark Tests of Fusion Plasma Simulation Codes for Studying Microturbulence and Energetic-Particle Dynamics

Tomo-Hiko WATANABE, Yasushi TODO, Wendell HORTON
2008 Plasma and Fusion Research  
Benchmark tests of two simulation codes used for studying microturbulence and energetic-particle dynamics in magnetic fusion plasmas are conducted on present-day parallel supercomputer systems.  ...  Both the codes achieved high efficiency on the Earth Simulator with vector processors, and showed good performance scaling on massively parallel supercomputers with more than 10,000 commodity processors  ...  Some of the work carried out on Japanese site is supported in part by grants-in-aid from the Ministry of Education, Culture, Sports, Science and Technology This US-Japan research collaboration is also  ... 
doi:10.1585/pfr.3.061 fatcat:44g5llbqxnajnkwumd4sp5l63m
« Previous Showing results 1 — 15 out of 233 results