A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2006; you can also visit the original URL.
The file type is application/pdf
.
Filters
Titanium Performance and Potential: An NPB Experimental Study
[chapter]
2006
Lecture Notes in Computer Science
Moreover, we have found that the Titanium implementations of three of the NAS Parallel Benchmarks can match or even exceed the performance of the standard Fortran/MPI implementations at realistic problem ...
We present an overview of the language features and demonstrate their use in the context of the NAS Parallel Benchmarks, a standard suite of common scientific kernels. ...
All of these features help raise the level of abstraction when compared to most serial languages commonly used in parallel computing. ...
doi:10.1007/978-3-540-69330-7_14
fatcat:zwatvttukjcu7csidw3qa2rlzu
A programmable preprocessor for parallelizing Fortran-90
1999
Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99
A 4000 line preprocessor library has been written and used to parallelize roughly 50,000 lines of groundwater modeling code. The programs have been ported to a wide range of parallel architectures. ...
A programmable preprocessor that generates portable and efficient parallel Fortran-90 code has been successfully used in the development of a variety of environmental transport simulators for the Department ...
An example of this is generating communication code based on how an array is distributed and whether ghost cells need to be updated. ...
doi:10.1145/331532.331535
dblp:conf/sc/RosingY99
fatcat:6ulqwz7gonb5jo2bxovtfrefxu
Extensible PGAS semantics for C++
2010
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model - PGAS '10
Leveraging these capabilities of C++, we have implemented the Partitioned Global Property Map, a DSEL library supporting PGAS semantics, polymorphic partitioned global data structures, and a number of ...
The Partitioned Global Address Space model combines the expression of data locality in SPMD applications, which is crucial to achieving good parallel performance, with the relative simplicity of the Distributed ...
ghost cells. ...
doi:10.1145/2020373.2020385
dblp:conf/pgas/EdmondsGL10
fatcat:2gf57hdztjcwrhscq3oz333saq
Irregular Coarse-Grain Data Parallelism under LPARX
1996
Scientific Programming
LPARX provides structural abstraction, representing data decompositions as first-class objects that can be manipulated and modified at runtime. ...
It supports coarse-grain data parallelism and gives the application complete control over specifying arbitrary block decompositions. ...
This work was supported by NSF contract ASC-9110793 and Ol\"R contract l\00014-93-1-0152. Intel Paragon and Cray C-90 time were provided by a UCSD School of Engineering Block Grant. ...
doi:10.1155/1996/701628
fatcat:ln7pks2jxvcglbszomu2dvtbri
An adaptive mesh refinement benchmark for modern parallel programming languages
2007
Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07
languages, for examples, the PGAS languages: Co-Array Fortran, Unified Parallel C (UPC), and Titanium, as well as the recent HPCS languages: Chapel by Cray Inc., Fortress by Sun Microsystems, and X10 ...
dynamic load balancing, as well as fine-grained communications and irregular operations for updating grid boundaries in the adaptive mesh hierarchy. ...
For the ghost cells not covered by fine grid 1, the corresponding ghost values are updated with certain interpolation procedure that may involve data from the coarse level and become location dependent ...
doi:10.1145/1362622.1362676
dblp:conf/sc/WenSCYK07
fatcat:xsmzwgmxiraetc64m4h2zt2eze
Automatic generation of parallel C code for stencil applications written in MATLAB
2016
Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming - ARRAY 2016
The generated parallel code of the Tsunami simulation reaches the performance of the available parallel reference implementations. ...
This paper presents performance results of an automatic translation from a MATLAB subset into efficient parallelized C code for different architectures: multicores, compute clusters, and GPGPUs. ...
The default MATLAB implementation omits a data type specification for the grid cells. Thus, the current type deduction of the compiler infers the grid cells to type int. ...
doi:10.1145/2935323.2935329
dblp:conf/pldi/SpazierCS16
fatcat:jxkflwc4czhh7c6mpzairp6v3e
Applications of HPJava
[chapter]
2004
Lecture Notes in Computer Science
We describe two applications of our HPJava language for parallel computing. ...
The first is a multigrid solver for a Poisson equation, and the second is a CFD application that solves the Euler equations for inviscid flow. ...
The optional arguments wlo, whi to Adlib.writeHalo() define the widths of the parts ghost regions that need updating (the default is to update the whole of the ghost regions of the array, whatever their ...
doi:10.1007/978-3-540-24644-2_10
fatcat:qhmoa5bhobf6ne6eraqazcsjkq
Efficient Implicit Parallel Patterns for Geographic Information System
2017
Procedia Computer Science
These patterns are abstract models for a class of algorithms which can be customized and automatically transformed in a parallel execution. ...
They are particularly used in geosciences and we illustrate them with the flow direction and the flow accumulation computations. ...
It is sufficient to replace the lines 10, 19 and 26 with a way to memories which cells has been added. In this way the new pattern is reduced to the cell updates and the ghost exchanges of m out . ...
doi:10.1016/j.procs.2017.05.235
fatcat:xva3gyebczbhvl6defq7766wta
A programming methodology for dual-tier multicomputers
2000
IEEE Transactions on Software Engineering
KeLP's abstractions hide considerable detail without sacri cing performance, and dual-tier applications written in KeLP consistently outperform equivalent single-tier implementations written in MPI. ...
KeLP2 supports two levels of locality and parallelism via hierarchical SPMD control ow, run-time geometric meta-data, and asynchronous collective communication. ...
The authors would like to thank Paul Kelly and the anonymous referees for helpful suggestions on how to improve this paper. ...
doi:10.1109/32.842948
fatcat:mbs2jty2effw5dcvckrmulqlmq
Reusable Object-Oriented Solutions for Numerical Simulation of PDEs in a High Performance Environment
2006
Scientific Programming
, that support extensibility and run-time flexibility in the implementation of physical models and generic numerical algorithms respectively. ...
The paper presents solutions developed to effectively tackle these and other more specific problems (data handling and storage, implementation of physical models and numerical methods) that have arisen ...
() const {return BaseClass::m_ptr->getGlobalSize();} };
Fig. 2 . 2 Cell-wise mesh partitioning that shows updatable and ghost states in the overlap region. ...
doi:10.1155/2006/393058
fatcat:cncrvkq63nf6zonazue5oqj2a4
Runtime support for scalable programming in Java
2007
Journal of Supercomputing
So we fully support communication of intrinsic Java types, including primitive types, and Java object types. ...
Our HPJava is based around a small set of language extensions designed to support parallel computation with distributed arrays, plus a set of communication libraries. ...
More general forms of writeHalo may specify that only a subset of the available ghost area is to be updated, or may select cyclic wraparound for updating ghost cells at the extreme ends of the array. ...
doi:10.1007/s11227-007-0125-5
fatcat:nlmxxwvftvforg7rumfnyu5vve
Automatically Distributing Eulerian and Hybrid Fluid Simulations in the Cloud
2018
ACM Transactions on Graphics
ACKNOWLEDGMENTS First and foremost, we thank Ron Fedkiw and his research group, especially Saket Pakhar, Rahul Sheth, and David Hyde. ...
Over the course of developing Nimbus, they have been tremendously helpful and always available to answer questions about simulation methods and PhysBAM. ...
The largest ghost region has 3 × (256/4 − 2 × 3) 2 = 10,092 cells.Listing 1. Type definition for a float array application object. ...
doi:10.1145/3173551
fatcat:mlumvyz7xbfphahbrl53mx7bri
Zippy: A Framework for Computation and Visualization on a GPU Cluster
2008
Computer graphics forum (Print)
It abstracts the GPU cluster programming with a two-level parallelism hierarchy and a non-uniform memory access (NUMA) model. ...
They demonstrate that Zippy can ease the development and integration of parallel visualization, graphics, and computation modules on a GPU cluster. ...
Acknowledgements We wish to thank Jarek Nieplocha and Manojkumar Krishnan for discussing GA and Mike Houston for discussing parallel volume rendering. This work is supported by NSF grant CCF-0702699. ...
doi:10.1111/j.1467-8659.2008.01131.x
fatcat:bchxru3j6ffdli36u57rjirhsm
Simulation of shallow-water systems using graphics processing units
2009
Mathematics and Computers in Simulation
The potential data parallelism of this method is identified and the scheme is efficiently implemented on GPUs for one-layer shallow-water systems. ...
Numerical experiments performed on several GPUs show the high efficiency of the GPU solver in comparison with a highly optimized implementation of a CPU solver. ...
Lastra and C. Ureña also acknowledge partial support from DGI-MEC project TIN2004-07672-c03-02. M. Castro acknowledges partial support from DGI-MEC project MTM2006-08075. ...
doi:10.1016/j.matcom.2009.09.012
fatcat:aqy4mkk63jg35fn4ehkqu5ith4
Parallel Languages and Compilers: Perspective From the Titanium Experience
2007
The international journal of high performance computing applications
types that are value types rather than reference types), operator overloading, and generic programming. ...
We summarize results and lessons learned from implementing the NAS parallel benchmarks, elliptic and hyperbolic solvers using Adaptive Mesh Refinement, and several applications of the Immersed Boundary ...
less abstraction and productivity features. ...
doi:10.1177/1094342007078449
fatcat:y52hkslgw5fbtjiyb3viwy3quq
« Previous
Showing results 1 — 15 out of 2,403 results