Filters








179 Hits in 3.9 sec

Experiences with Sweep3D implementations in Co-array Fortran

Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey
2006 Journal of Supercomputing  
In this paper, we present a study of several CAF implementations of Sweep3D on four cluster architectures.  ...  Our earlier studies show that CAF programs achieve similar performance to that of corresponding MPI codes for the NAS Parallel Benchmarks.  ...  Figure 3 : 3 Figure 3: Sweep3D-CAF-mb kernel pseudocode. Figure 4 : 4 Figure 4: MPI, CAF and ARMCI versions of the blocking PUT microbenchmark.  ... 
doi:10.1007/s11227-006-7952-7 fatcat:fsdwr2bbtzdpnkgkeorn36rkly

Co-array Fortran Performance and Potential: An NPB Experimental Study [chapter]

Cristian Coarfa, Yuri Dotsenko, Jason Eckhardt, John Mellor-Crummey
2004 Lecture Notes in Computer Science  
Section 3 proposes extensions to CAF to enable it to deliver portable high performance. In Section 4, we outline the implementation strategy of our source-to-source CAF compiler.  ...  In Section 6, we describe experiments using versions of the NAS parallel benchmarks to compare the performance of CAF and MPI.  ...  Wallcraft for providing us with draft CAF versions of the BT, CG, MG, and SP NAS parallel benchmarks. We thank F. Zhao for her work on the Open64/SL Fortran front-end.  ... 
doi:10.1007/978-3-540-24644-2_12 fatcat:kbmor7r3qjc5ll54h7fyllfqny

Multithreaded global address space communication techniques for gyrokinetic fusion applications on ultra-scale platforms

Robert Preissl, Nathan Wichmann, Bill Long, John Shalf, Stephane Ethier, Alice Koniges
2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11  
We introduce new hybrid PGAS/OpenMP implementations of highly optimized hybrid MPI/OpenMP based communication kernels.  ...  The hybrid PGAS implementations use an extension of standard hybrid programming techniques, enabling the distribution of high communication work loads of the underlying kernel among OpenMP threads.  ...  In experiments using more than 4K processors we observed that the manually implemented CAF analogue to MPI Allreduce did not perform as well as the MPI implementation.  ... 
doi:10.1145/2063384.2063404 fatcat:43eusrdemvdihgycl77u5pddiq

Multithreaded Global Address Space Communication Techniques for Gyrokinetic Fusion Applications on Ultra-Scale Platforms

Robert Preissl, Nathan Wichmann, Bill Long, John Shalf, Stephane Ethier, Alice Koniges
2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11  
We introduce new hybrid PGAS/OpenMP implementations of highly optimized hybrid MPI/OpenMP based communication kernels.  ...  The hybrid PGAS implementations use an extension of standard hybrid programming techniques, enabling the distribution of high communication work loads of the underlying kernel among OpenMP threads.  ...  In experiments using more than 4K processors we observed that the manually implemented CAF analogue to MPI Allreduce did not perform as well as the MPI implementation.  ... 
doi:10.1145/2063384.2071033 dblp:conf/sc/PreisslWLSEK11 fatcat:4vgs7zdn4zdu3ivdrjwis3s6nq

An evaluation of global address space languages

Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey, François Cantonnet, Tarek El-Ghazawi, Ashrujit Mohanti, Yiyi Yao, Daniel Chavarría-Miranda
2005 Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '05  
We compare CAF and UPC variants of these programs with the original Fortran+MPI code. Today, CAF and UPC programs deliver scalable performance on clusters only when written to use bulk communication.  ...  Co-array Fortran (CAF) and Unified Parallel C (UPC) are two emerging languages for single-program, multiple-data global address space programming.  ...  The CAF implementation uses point-to-point synchronization, while the UPC implementation uses split-phase barrier synchronization.  ... 
doi:10.1145/1065944.1065950 dblp:conf/ppopp/CoarfaDMCEMYC05 fatcat:pnprmfmb4bgpdmnxauwnvzddhy

Coarray-based load balancing on heterogeneous and many-core architectures

Valeria Cardellini, Alessandro Fanfarillo, Salvatore Filippone
2017 Parallel Computing  
In this highly dynamic scenario, Partitioned Global Address Space (PGAS) languages, like Coarray Fortran, appear a promising alternative to standard MPI programming that uses two-sided communications,  ...  In this paper, we show how Coarray Fortran can be used for implementing dynamic load balancing algorithms on an exascale compute node and how these algorithms can produce performance benefits for an Asian  ...  Even though the CAF implementation used in the tests is based on MPI-3.0, it provides better performance than the explicit MPI two-sided implementation, because the communication pattern required by the  ... 
doi:10.1016/j.parco.2017.06.001 fatcat:icnn7lqkeze55iguprjctn2c7i

Hiding latency in Coarray Fortran 2.0

William N. Scherer, Laksono Adhianto, Guohua Jin, John Mellor-Crummey, Chaoran Yang
2010 Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model - PGAS '10  
We outline how these operations are implemented and describe code fragments from several benchmark programs to show we use these operations to hide latency by overlapping communication and computation.  ...  Acknowledgments We acknowledge Fengmei Zhao for her implementation of the bulk of the Coarray Fortran 2.0 translator.  ...  Our CAF 2.0 implementation of HPL implements a sophisticated tiling of the computation, capable of varying both the logical topology used to organize processor cores, as well as the width of data panels  ... 
doi:10.1145/2020373.2020387 dblp:conf/pgas/SchererAJMY10 fatcat:lcju6lx4dfeahchhblyh7bukxq

Experiences with Co-array Fortran on Hardware Shared Memory Platforms [chapter]

Yuri Dotsenko, Cristian Coarfa, John Mellor-Crummey, Daniel Chavarría-Miranda
2005 Lecture Notes in Computer Science  
We describe a set of implementation alternatives and evaluate their performance implications for CAF variants of the STREAM, Random Access, Spark98 and NAS MG & SP benchmarks.  ...  We compare the performance of library-based implementations of one-sided communication with finegrain communication that accesses remote data using load and store operations.  ...  For each benchmark, we present the parallel efficiency of the MPI, CAF and OpenMP implementations 5 .  ... 
doi:10.1007/11532378_24 fatcat:mwn6ch2oqvcz5dg5yww3p3owdq

DART-MPI: An MPI-based Implementation of a PGAS Runtime System [article]

Huan Zhou, Yousri Mhedheb, Kamran Idrees, Colin W. Glass, José Gracia, Karl Fürlinger, Jie Tao
2015 arXiv   pre-print
A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate.  ...  We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.  ...  IMPLEMENTATION WITH MPI-3 We begin with an overview of the MPI-3 standard, and then depict the way of applying MPI-3 to the DART implementation.  ... 
arXiv:1507.01773v1 fatcat:zip7uoncobdedoswae2nw6tyhu

Parallel Event Selection on HPC Systems

Marc Paterno, Jim Kowalkowski, Saba Sehrish, A. Forti, L. Betev, M. Litmaath, O. Smirnova, P. Hristov
2019 EPJ Web of Conferences  
We use MPI, numpy and h5py to implement our approach and compare the performance with the existing approach.  ...  We represent our n-tuple data in HDF5 format that is optimized for the HPC environment and which allows us to use the machine's high-performance parallel I/O capabilities.  ...  Acknowledgments We would like to thank the members of the NOvA collaboration for providing us with the relevant material for this work. This manuscript has been authored by  ... 
doi:10.1051/epjconf/201921404059 fatcat:47czr7pos5f4fap7xthjh3gaue

Experiences Developing the OpenUH Compiler and Runtime Infrastructure

Barbara Chapman, Deepak Eachempati, Oscar Hernandez
2012 International journal of parallel programming  
For the past several years, we have used OpenUH to conduct research in parallel programming models and their implementation, static and dynamic analysis of parallel applications, and compiler integration  ...  In this paper, we describe the evolution of the OpenUH infrastructure and how we've used it to carry out our research and teaching efforts.  ...  , and (3) a portable runtime library. 1) Front-end: We modified the Cray Fortran 95 front-end used by OpenUH to support our coarrays implementation.  ... 
doi:10.1007/s10766-012-0230-9 fatcat:nc2eqlg3nzbnth5emhkrwo3aiq

A highly scalable particle tracking algorithm using partitioned global address space (PGAS) programming for extreme-scale turbulence simulations

D. Buaria, P.K. Yeung
2017 Computer Physics Communications  
This transfer is implemented very efficiently as a one-sided communication, using Co-Array Fortran (CAF) features which facilitate small data movements between different local partitions of a large global  ...  For operations on the particles in a 8192^3 simulation (0.55 trillion grid points) on 262,144 Cray XE6 cores, the new algorithm is found to be orders of magnitude faster relative to a prior algorithm in  ...  Fiedler of Cray Inc. for his advice on use of Co-Array Fortran, staff members of the Blue Waters project for their valuable assistance, and K.  ... 
doi:10.1016/j.cpc.2017.08.022 fatcat:lywopktymjau7lhy47cr5jo5tm

A new vision for coarray Fortran

John Mellor-Crummey, Laksono Adhianto, William N. Scherer, Guohua Jin
2009 Proceedings of the Third Conference on Partitioned Global Address Space Programing Models - PGAS '09  
Careful review of drafts of the emerging Fortran 2008 standard led us to identify several shortcomings with the proposed coarray extensions.  ...  this paper, we briefly critique the coarray extensions proposed for Fortran 2008, outline a new vision for coarrays in Fortran language that is far more expressive, and briefly describe our strategy for implementing  ...  Figure 3 : 3 The allocate statement. Figure 4 : 4 Using a copointer.  ... 
doi:10.1145/1809961.1809969 fatcat:xhhk3klg4vfczftr4yk5c7ea6u

Evaluation of Remote Memory Access Communication on the Cray XT3

V. Tipparaju, A. Kot, J. Nieplocha, M. ten Bruggencate, N. Chrisochoides
2007 2007 IEEE International Parallel and Distributed Processing Symposium  
The performance of these interfaces is studied and compared to MPI performance.  ...  This paper evaluates remote memory access (RMA) communication capabilities and performance on the Cray XT3.  ...  It was also used to implement several parallel programming models such as CAF, Global Arrays or GPSHMEM [12] .  ... 
doi:10.1109/ipdps.2007.370478 dblp:conf/ipps/TipparajuKNBC07 fatcat:c7cllt26ubb3foqnwbof7paue4

A UPC++ Actor Library and Its Evaluation On a Shallow Water Proxy Application

Alexander Pppl, Scott Baden, Michael Bader
2019 2019 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI (PAW-ATM)  
The most widely used approach here is to use MPI for inter-node communication and parallelization, and OpenMP for the on-node parallelization.  ...  Another promising model is the task-based programming model [3] . Here, the programmer specifies pieces of computation and communication as tasks, and also their dependencies.  ...  In essence, the model implemented in Charm++ resembles the CAF actor model more closely than ours.  ... 
doi:10.1109/paw-atm49560.2019.00007 dblp:conf/sc/PopplBB19 fatcat:ecudfm2vvvavfbkxqd7um4ajoe
« Previous Showing results 1 — 15 out of 179 results