A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2007; you can also visit the original URL.
The file type is application/pdf
.
Filters
The design and implementation of a parallel array operator for the arbitrary remapping of data
2003
SIGPLAN notices
In our implementation of this operator in ZPL, we demonstrate performance comparable to the hand-coded Fortran + MPI versions of the NAS FT and CG benchmarks. ...
In this paper, we present a highly-general array operator with powerful gather and scatter capabilities unmatched by other array languages. ...
Moreover, the operator is general enough to apply to most array languages. • We discuss a parallel implementation for the operator and introduce optimizations for schedule compression, dead array reuse ...
doi:10.1145/966049.781526
fatcat:crkdjcce3rd75oxwgxmhyjkfcu
The design and implementation of a parallel array operator for the arbitrary remapping of data
2003
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '03
In our implementation of this operator in ZPL, we demonstrate performance comparable to the hand-coded Fortran + MPI versions of the NAS FT and CG benchmarks. ...
In this paper, we present a highly-general array operator with powerful gather and scatter capabilities unmatched by other array languages. ...
Moreover, the operator is general enough to apply to most array languages. • We discuss a parallel implementation for the operator and introduce optimizations for schedule compression, dead array reuse ...
doi:10.1145/781498.781526
dblp:conf/ppopp/DeitzCCS03
fatcat:xhjhitem2fcl7j33z7wcq7nldy
The design and implementation of a parallel array operator for the arbitrary remapping of data
2003
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '03
In our implementation of this operator in ZPL, we demonstrate performance comparable to the hand-coded Fortran + MPI versions of the NAS FT and CG benchmarks. ...
In this paper, we present a highly-general array operator with powerful gather and scatter capabilities unmatched by other array languages. ...
Moreover, the operator is general enough to apply to most array languages. • We discuss a parallel implementation for the operator and introduce optimizations for schedule compression, dead array reuse ...
doi:10.1145/781524.781526
fatcat:xrpu62x73rg5vn2dsfc4ezazke
Collective Communications for Scalable Programming
[chapter]
2005
Lecture Notes in Computer Science
HPJava is an environment for scientific and parallel programming using Java. It is based on an extended version of the Java language. ...
One feature that HPJava adds to Java is a multi-dimensional array, or multiarray, with properties similar to the arrays of Fortran. ...
Java looks like a promising alternative for the future. We have discussed in detail the design and development of high-level library for HPJava-this is essentially communication library. ...
doi:10.1007/11576235_33
fatcat:licvaoaf7zhy5in2nhq6zroc5q
Runtime support for scalable programming in Java
2007
Journal of Supercomputing
This communication library supports collective operations on distributed arrays. We include Java Object as one of the Adlib communication data types. ...
Our HPJava is based around a small set of language extensions designed to support parallel computation with distributed arrays, plus a set of communication libraries. ...
The source and destination arrays must have the same shape, and they must also be identically aligned. By design, shift() implements a simpler pattern of communication than general remap(). ...
doi:10.1007/s11227-007-0125-5
fatcat:nlmxxwvftvforg7rumfnyu5vve
Global arrays
1994
Supercomputing, Proceedings
n designed to complement rather than
substitute the message-passing model
n leads to simple coding and efficient
execution for a class of applications. ...
patches)
Performance of scaled add operation
using access vs. get/put operation
Visualization tool
n for 2D data only
n Helps the programmer design efficient
task scheduling strategies
n The ...
doi:10.1145/602831.602833
fatcat:jn7xo432dvfqlpjoah2oae2k4e
Global arrays
1994
Supercomputing, Proceedings
n designed to complement rather than
substitute the message-passing model
n leads to simple coding and efficient
execution for a class of applications. ...
patches)
Performance of scaled add operation
using access vs. get/put operation
Visualization tool
n for 2D data only
n Helps the programmer design efficient
task scheduling strategies
n The ...
doi:10.1145/602770.602833
fatcat:q3khaiyaoba2tczsar4drirwpe
Efficient Parallel I/O in Community Atmosphere Model (CAM)
2008
The international journal of high performance computing applications
We describe the parallel I/O development of CAM in this paper. The parallel I/ O combines a novel remapping of 3-D arrays with the parallel netCDF library as the I/O interface. ...
For a standard single history output of CAM 3.1 FV-D resolution run (multiple 2-D and 3-D arrays with total size 4.1 GB), our parallel I/O speeds up by a factor of 14 on IBM SP3, compared with the existing ...
We thank the PnetCDF team, especially Jian Li of NWU and Rob Ross of ANL, for answering a large number of questions and the quick fixing of many bugs. ...
doi:10.1177/1094342008090914
fatcat:b2jihymhzvf4zedfbm2dyj57jy
Optimal Compilation of HPF Remappings
1996
Journal of Parallel and Distributed Computing
It is proved optimal: A minimal number of messages, containing only the required data, is sent over the network. The technique is fully implemented in hpfc, our prototype hpf compiler. ...
This paper presents a new compilation technique to handle hpf remappings for message-passing parallel architectures. ...
Optimal Compilation of Hpf Remappings 23 for the improvements he suggested, Pierre Jouvelot for corrections, Philippe Marquet for technical support on the farm, William Pugh for suggestions and Xavier ...
doi:10.1006/jpdc.1996.0143
fatcat:awhwafqdkfeeti5jtwagisylgu
Streamlining GPU applications on the fly
2010
Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10
However, as GPUs are designed for massive data-parallel computing, their performance is subject to the presence of condition statements in a GPU application. ...
It introduces an abstract form of GPU applications, based on which, it describes the use of reference redirection and data layout transformation for remapping data and threads to minimize thread divergences ...
And no two threads write to a common memory location. 2) For an arbitrary input data element accessed by thread i, there must be a counterpart in the input data set of thread j. ...
doi:10.1145/1810085.1810104
dblp:conf/ics/ZhangJGS10
fatcat:5txfxo7x35hvfcn667ba5wkvtm
Fast parallel multidimensional FFT using advanced MPI
2019
Journal of Parallel and Distributed Computing
We present a new method for performing global redistributions of multidimensional arrays essential to parallel fast Fourier (or similar) transforms. ...
For a range of strong and weak scaling tests, we found the overall performance of our method to be on par and often better than well-established libraries like MPI-FFTW, P3DFFT, and 2DECOMP&FFT. ...
The poor performance of the shared intra-node mode of operation is well known and has been the center of much focus, especially for supercomputers, which have been moving towards multicore designs, see ...
doi:10.1016/j.jpdc.2019.02.006
fatcat:zamt5hl64jcmbch25r3kqu7zrm
ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems
[chapter]
1999
Lecture Notes in Computer Science
By decoupling synchronization between the process that needs the data and the process that owns the data from the actual data transfer, implementation of parallel algorithms that operate on distributed ...
In this paper, we describe design, implementation and experience with the ARMCI (Aggregate Remote Memory Copy Interface), a new portable remote memory copy library we have been developing for optimized ...
Figure 6 : 6 Timings for original MPI vs new ARMCI implementation of remap. The operation is a particular redistribution of an N by N array. ...
doi:10.1007/bfb0097937
fatcat:ehkwhem54naetmkbrdujjf3f3u
An Introduction to High Performance Fortran
1995
Scientific Programming
This article provides a tutorial introduction to the main features of HPF. ...
High Performance Fortran (HPF) is an informal standard for extensions to Fortran 90 to assist its implementation on parallel architectures, particularly for data-parallel computation. ...
ACKNOWLEDGMENTS We would like to thank Bryan Carpenter for providing the Gaussian elimination example, John Eastmond for critically reading the manuscript, and Jerry Wagener for providing information and ...
doi:10.1155/1995/612973
fatcat:ayunnf5ojbdi5kttl257dhfbc4
Computing without processors
2012
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems - CASES '12
different targets. • Accelerator achieves this by constraining the data types used for parallel programming and by providing a restricted set of parallel array access operations. • Not all kinds of data-parallel ...
its difficult to map the arbitrary array indexing operations into efficient memory access operations on various targets. • A better way is to express this computation is in terms of a whole array operation ...
doi:10.1145/2380403.2380406
fatcat:32wfp6qs4rcq5fkwt6sa5xyh5i
Computing without processors
2012
Proceedings of the tenth ACM international conference on Embedded software - EMSOFT '12
different targets. • Accelerator achieves this by constraining the data types used for parallel programming and by providing a restricted set of parallel array access operations. • Not all kinds of data-parallel ...
its difficult to map the arbitrary array indexing operations into efficient memory access operations on various targets. • A better way is to express this computation is in terms of a whole array operation ...
doi:10.1145/2380356.2380359
fatcat:7knbolqninhwvogtsxnnytrwpe
« Previous
Showing results 1 — 15 out of 2,213 results