Filters








2,213 Hits in 6.2 sec

The design and implementation of a parallel array operator for the arbitrary remapping of data

Steven J. Deitz, Bradford L. Chamberlain, Sung-Eun Choi, Lawrence Snyder
2003 SIGPLAN notices  
In our implementation of this operator in ZPL, we demonstrate performance comparable to the hand-coded Fortran + MPI versions of the NAS FT and CG benchmarks.  ...  In this paper, we present a highly-general array operator with powerful gather and scatter capabilities unmatched by other array languages.  ...  Moreover, the operator is general enough to apply to most array languages. • We discuss a parallel implementation for the operator and introduce optimizations for schedule compression, dead array reuse  ... 
doi:10.1145/966049.781526 fatcat:crkdjcce3rd75oxwgxmhyjkfcu

The design and implementation of a parallel array operator for the arbitrary remapping of data

Steven J. Deitz, Bradford L. Chamberlain, Sung-Eun Choi, Lawrence Snyder
2003 Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '03  
In our implementation of this operator in ZPL, we demonstrate performance comparable to the hand-coded Fortran + MPI versions of the NAS FT and CG benchmarks.  ...  In this paper, we present a highly-general array operator with powerful gather and scatter capabilities unmatched by other array languages.  ...  Moreover, the operator is general enough to apply to most array languages. • We discuss a parallel implementation for the operator and introduce optimizations for schedule compression, dead array reuse  ... 
doi:10.1145/781498.781526 dblp:conf/ppopp/DeitzCCS03 fatcat:xhjhitem2fcl7j33z7wcq7nldy

The design and implementation of a parallel array operator for the arbitrary remapping of data

Steven J. Deitz, Bradford L. Chamberlain, Sung-Eun Choi, Lawrence Snyder
2003 Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '03  
In our implementation of this operator in ZPL, we demonstrate performance comparable to the hand-coded Fortran + MPI versions of the NAS FT and CG benchmarks.  ...  In this paper, we present a highly-general array operator with powerful gather and scatter capabilities unmatched by other array languages.  ...  Moreover, the operator is general enough to apply to most array languages. • We discuss a parallel implementation for the operator and introduce optimizations for schedule compression, dead array reuse  ... 
doi:10.1145/781524.781526 fatcat:xrpu62x73rg5vn2dsfc4ezazke

Collective Communications for Scalable Programming [chapter]

Sang Boem Lim, Bryan Carpenter, Geoffrey Fox, Han-Ku Lee
2005 Lecture Notes in Computer Science  
HPJava is an environment for scientific and parallel programming using Java. It is based on an extended version of the Java language.  ...  One feature that HPJava adds to Java is a multi-dimensional array, or multiarray, with properties similar to the arrays of Fortran.  ...  Java looks like a promising alternative for the future. We have discussed in detail the design and development of high-level library for HPJava-this is essentially communication library.  ... 
doi:10.1007/11576235_33 fatcat:licvaoaf7zhy5in2nhq6zroc5q

Runtime support for scalable programming in Java

Sang Boem Lim, Hanku Lee, Bryan Carpenter, Geoffrey Fox
2007 Journal of Supercomputing  
This communication library supports collective operations on distributed arrays. We include Java Object as one of the Adlib communication data types.  ...  Our HPJava is based around a small set of language extensions designed to support parallel computation with distributed arrays, plus a set of communication libraries.  ...  The source and destination arrays must have the same shape, and they must also be identically aligned. By design, shift() implements a simpler pattern of communication than general remap().  ... 
doi:10.1007/s11227-007-0125-5 fatcat:nlmxxwvftvforg7rumfnyu5vve

Global arrays

Jaroslaw Nieplocha, Robert J. Harrison, Richard J. Littlefield
1994 Supercomputing, Proceedings  
n designed to complement rather than substitute the message-passing model n leads to simple coding and efficient execution for a class of applications.  ...  patches) Performance of scaled add operation using access vs. get/put operation Visualization tool n for 2D data only n Helps the programmer design efficient task scheduling strategies n The  ... 
doi:10.1145/602831.602833 fatcat:jn7xo432dvfqlpjoah2oae2k4e

Global arrays

Jaroslaw Nieplocha, Robert J. Harrison, Richard J. Littlefield
1994 Supercomputing, Proceedings  
n designed to complement rather than substitute the message-passing model n leads to simple coding and efficient execution for a class of applications.  ...  patches) Performance of scaled add operation using access vs. get/put operation Visualization tool n for 2D data only n Helps the programmer design efficient task scheduling strategies n The  ... 
doi:10.1145/602770.602833 fatcat:q3khaiyaoba2tczsar4drirwpe

Efficient Parallel I/O in Community Atmosphere Model (CAM)

Yu-Heng Tseng, Chris Ding
2008 The international journal of high performance computing applications  
We describe the parallel I/O development of CAM in this paper. The parallel I/ O combines a novel remapping of 3-D arrays with the parallel netCDF library as the I/O interface.  ...  For a standard single history output of CAM 3.1 FV-D resolution run (multiple 2-D and 3-D arrays with total size 4.1 GB), our parallel I/O speeds up by a factor of 14 on IBM SP3, compared with the existing  ...  We thank the PnetCDF team, especially Jian Li of NWU and Rob Ross of ANL, for answering a large number of questions and the quick fixing of many bugs.  ... 
doi:10.1177/1094342008090914 fatcat:b2jihymhzvf4zedfbm2dyj57jy

Optimal Compilation of HPF Remappings

Fabien Coelho, Corinne Ancourt
1996 Journal of Parallel and Distributed Computing  
It is proved optimal: A minimal number of messages, containing only the required data, is sent over the network. The technique is fully implemented in hpfc, our prototype hpf compiler.  ...  This paper presents a new compilation technique to handle hpf remappings for message-passing parallel architectures.  ...  Optimal Compilation of Hpf Remappings 23 for the improvements he suggested, Pierre Jouvelot for corrections, Philippe Marquet for technical support on the farm, William Pugh for suggestions and Xavier  ... 
doi:10.1006/jpdc.1996.0143 fatcat:awhwafqdkfeeti5jtwagisylgu

Streamlining GPU applications on the fly

Eddy Z. Zhang, Yunlian Jiang, Ziyu Guo, Xipeng Shen
2010 Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10  
However, as GPUs are designed for massive data-parallel computing, their performance is subject to the presence of condition statements in a GPU application.  ...  It introduces an abstract form of GPU applications, based on which, it describes the use of reference redirection and data layout transformation for remapping data and threads to minimize thread divergences  ...  And no two threads write to a common memory location. 2) For an arbitrary input data element accessed by thread i, there must be a counterpart in the input data set of thread j.  ... 
doi:10.1145/1810085.1810104 dblp:conf/ics/ZhangJGS10 fatcat:5txfxo7x35hvfcn667ba5wkvtm

Fast parallel multidimensional FFT using advanced MPI

Lisandro Dalcin, Mikael Mortensen, David E. Keyes
2019 Journal of Parallel and Distributed Computing  
We present a new method for performing global redistributions of multidimensional arrays essential to parallel fast Fourier (or similar) transforms.  ...  For a range of strong and weak scaling tests, we found the overall performance of our method to be on par and often better than well-established libraries like MPI-FFTW, P3DFFT, and 2DECOMP&FFT.  ...  The poor performance of the shared intra-node mode of operation is well known and has been the center of much focus, especially for supercomputers, which have been moving towards multicore designs, see  ... 
doi:10.1016/j.jpdc.2019.02.006 fatcat:zamt5hl64jcmbch25r3kqu7zrm

ARMCI: A portable remote memory copy library for distributed array libraries and compiler run-time systems [chapter]

Jarek Nieplocha, Bryan Carpenter
1999 Lecture Notes in Computer Science  
By decoupling synchronization between the process that needs the data and the process that owns the data from the actual data transfer, implementation of parallel algorithms that operate on distributed  ...  In this paper, we describe design, implementation and experience with the ARMCI (Aggregate Remote Memory Copy Interface), a new portable remote memory copy library we have been developing for optimized  ...  Figure 6 : 6 Timings for original MPI vs new ARMCI implementation of remap. The operation is a particular redistribution of an N by N array.  ... 
doi:10.1007/bfb0097937 fatcat:ehkwhem54naetmkbrdujjf3f3u

An Introduction to High Performance Fortran

John Merlin, Anthony Hey
1995 Scientific Programming  
This article provides a tutorial introduction to the main features of HPF.  ...  High Performance Fortran (HPF) is an informal standard for extensions to Fortran 90 to assist its implementation on parallel architectures, particularly for data-parallel computation.  ...  ACKNOWLEDGMENTS We would like to thank Bryan Carpenter for providing the Gaussian elimination example, John Eastmond for critically reading the manuscript, and Jerry Wagener for providing information and  ... 
doi:10.1155/1995/612973 fatcat:ayunnf5ojbdi5kttl257dhfbc4

Computing without processors

Satnam Singh
2012 Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems - CASES '12  
different targets. • Accelerator achieves this by constraining the data types used for parallel programming and by providing a restricted set of parallel array access operations. • Not all kinds of data-parallel  ...  its difficult to map the arbitrary array indexing operations into efficient memory access operations on various targets. • A better way is to express this computation is in terms of a whole array operation  ... 
doi:10.1145/2380403.2380406 fatcat:32wfp6qs4rcq5fkwt6sa5xyh5i

Computing without processors

Satnam Singh
2012 Proceedings of the tenth ACM international conference on Embedded software - EMSOFT '12  
different targets. • Accelerator achieves this by constraining the data types used for parallel programming and by providing a restricted set of parallel array access operations. • Not all kinds of data-parallel  ...  its difficult to map the arbitrary array indexing operations into efficient memory access operations on various targets. • A better way is to express this computation is in terms of a whole array operation  ... 
doi:10.1145/2380356.2380359 fatcat:7knbolqninhwvogtsxnnytrwpe
« Previous Showing results 1 — 15 out of 2,213 results