2,403 Hits in 4.9 sec

Titanium Performance and Potential: An NPB Experimental Study [chapter]

Kaushik Datta, Dan Bonachea, Katherine Yelick
2006 Lecture Notes in Computer Science  
Moreover, we have found that the Titanium implementations of three of the NAS Parallel Benchmarks can match or even exceed the performance of the standard Fortran/MPI implementations at realistic problem  ...  We present an overview of the language features and demonstrate their use in the context of the NAS Parallel Benchmarks, a standard suite of common scientific kernels.  ...  All of these features help raise the level of abstraction when compared to most serial languages commonly used in parallel computing.  ... 
doi:10.1007/978-3-540-69330-7_14 fatcat:zwatvttukjcu7csidw3qa2rlzu

A programmable preprocessor for parallelizing Fortran-90

Matt Rosing
1999 Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99  
A 4000 line preprocessor library has been written and used to parallelize roughly 50,000 lines of groundwater modeling code. The programs have been ported to a wide range of parallel architectures.  ...  A programmable preprocessor that generates portable and efficient parallel Fortran-90 code has been successfully used in the development of a variety of environmental transport simulators for the Department  ...  An example of this is generating communication code based on how an array is distributed and whether ghost cells need to be updated.  ... 
doi:10.1145/331532.331535 dblp:conf/sc/RosingY99 fatcat:6ulqwz7gonb5jo2bxovtfrefxu

Extensible PGAS semantics for C++

Nick Edmonds, Douglas Gregor, Andrew Lumsdaine
2010 Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model - PGAS '10  
Leveraging these capabilities of C++, we have implemented the Partitioned Global Property Map, a DSEL library supporting PGAS semantics, polymorphic partitioned global data structures, and a number of  ...  The Partitioned Global Address Space model combines the expression of data locality in SPMD applications, which is crucial to achieving good parallel performance, with the relative simplicity of the Distributed  ...  ghost cells.  ... 
doi:10.1145/2020373.2020385 dblp:conf/pgas/EdmondsGL10 fatcat:2gf57hdztjcwrhscq3oz333saq

Irregular Coarse-Grain Data Parallelism under LPARX

Scott R. Kohn, Scott B. Baden
1996 Scientific Programming  
LPARX provides structural abstraction, representing data decompositions as first-class objects that can be manipulated and modified at runtime.  ...  It supports coarse-grain data parallelism and gives the application complete control over specifying arbitrary block decompositions.  ...  This work was supported by NSF contract ASC-9110793 and Ol\"R contract l\00014-93-1-0152. Intel Paragon and Cray C-90 time were provided by a UCSD School of Engineering Block Grant.  ... 
doi:10.1155/1996/701628 fatcat:ln7pks2jxvcglbszomu2dvtbri

An adaptive mesh refinement benchmark for modern parallel programming languages

Tong Wen, Jimmy Su, Phillip Colella, Katherine Yelick, Noel Keen
2007 Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07  
languages, for examples, the PGAS languages: Co-Array Fortran, Unified Parallel C (UPC), and Titanium, as well as the recent HPCS languages: Chapel by Cray Inc., Fortress by Sun Microsystems, and X10  ...  dynamic load balancing, as well as fine-grained communications and irregular operations for updating grid boundaries in the adaptive mesh hierarchy.  ...  For the ghost cells not covered by fine grid 1, the corresponding ghost values are updated with certain interpolation procedure that may involve data from the coarse level and become location dependent  ... 
doi:10.1145/1362622.1362676 dblp:conf/sc/WenSCYK07 fatcat:xsmzwgmxiraetc64m4h2zt2eze

Automatic generation of parallel C code for stencil applications written in MATLAB

Johannes Spazier, Steffen Christgau, Bettina Schnor
2016 Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming - ARRAY 2016  
The generated parallel code of the Tsunami simulation reaches the performance of the available parallel reference implementations.  ...  This paper presents performance results of an automatic translation from a MATLAB subset into efficient parallelized C code for different architectures: multicores, compute clusters, and GPGPUs.  ...  The default MATLAB implementation omits a data type specification for the grid cells. Thus, the current type deduction of the compiler infers the grid cells to type int.  ... 
doi:10.1145/2935323.2935329 dblp:conf/pldi/SpazierCS16 fatcat:jxkflwc4czhh7c6mpzairp6v3e

Applications of HPJava [chapter]

Bryan Carpenter, Geoffrey Fox, Han-Ku Lee, Sang Boem Lim
2004 Lecture Notes in Computer Science  
We describe two applications of our HPJava language for parallel computing.  ...  The first is a multigrid solver for a Poisson equation, and the second is a CFD application that solves the Euler equations for inviscid flow.  ...  The optional arguments wlo, whi to Adlib.writeHalo() define the widths of the parts ghost regions that need updating (the default is to update the whole of the ghost regions of the array, whatever their  ... 
doi:10.1007/978-3-540-24644-2_10 fatcat:qhmoa5bhobf6ne6eraqazcsjkq

Efficient Implicit Parallel Patterns for Geographic Information System

Kevin Bourgeois, Sophie Robert, Sébastien Limet, Victor Essayan
2017 Procedia Computer Science  
These patterns are abstract models for a class of algorithms which can be customized and automatically transformed in a parallel execution.  ...  They are particularly used in geosciences and we illustrate them with the flow direction and the flow accumulation computations.  ...  It is sufficient to replace the lines 10, 19 and 26 with a way to memories which cells has been added. In this way the new pattern is reduced to the cell updates and the ghost exchanges of m out .  ... 
doi:10.1016/j.procs.2017.05.235 fatcat:xva3gyebczbhvl6defq7766wta

A programming methodology for dual-tier multicomputers

S.B. Baden, S.J. Fink
2000 IEEE Transactions on Software Engineering  
KeLP's abstractions hide considerable detail without sacri cing performance, and dual-tier applications written in KeLP consistently outperform equivalent single-tier implementations written in MPI.  ...  KeLP2 supports two levels of locality and parallelism via hierarchical SPMD control ow, run-time geometric meta-data, and asynchronous collective communication.  ...  The authors would like to thank Paul Kelly and the anonymous referees for helpful suggestions on how to improve this paper.  ... 
doi:10.1109/32.842948 fatcat:mbs2jty2effw5dcvckrmulqlmq

Reusable Object-Oriented Solutions for Numerical Simulation of PDEs in a High Performance Environment

Andrea Lani, Tiago Quintino, Dries Kimpe, Herman Deconinck, Stefan Vandewalle, Stefaan Poedts
2006 Scientific Programming  
, that support extensibility and run-time flexibility in the implementation of physical models and generic numerical algorithms respectively.  ...  The paper presents solutions developed to effectively tackle these and other more specific problems (data handling and storage, implementation of physical models and numerical methods) that have arisen  ...  () const {return BaseClass::m_ptr->getGlobalSize();} }; Fig. 2 . 2 Cell-wise mesh partitioning that shows updatable and ghost states in the overlap region.  ... 
doi:10.1155/2006/393058 fatcat:cncrvkq63nf6zonazue5oqj2a4

Runtime support for scalable programming in Java

Sang Boem Lim, Hanku Lee, Bryan Carpenter, Geoffrey Fox
2007 Journal of Supercomputing  
So we fully support communication of intrinsic Java types, including primitive types, and Java object types.  ...  Our HPJava is based around a small set of language extensions designed to support parallel computation with distributed arrays, plus a set of communication libraries.  ...  More general forms of writeHalo may specify that only a subset of the available ghost area is to be updated, or may select cyclic wraparound for updating ghost cells at the extreme ends of the array.  ... 
doi:10.1007/s11227-007-0125-5 fatcat:nlmxxwvftvforg7rumfnyu5vve

Automatically Distributing Eulerian and Hybrid Fluid Simulations in the Cloud

Omid Mashayekhi, Chinmayee Shah, Hang Qu, Andrew Lim, Philip Levis
2018 ACM Transactions on Graphics  
ACKNOWLEDGMENTS First and foremost, we thank Ron Fedkiw and his research group, especially Saket Pakhar, Rahul Sheth, and David Hyde.  ...  Over the course of developing Nimbus, they have been tremendously helpful and always available to answer questions about simulation methods and PhysBAM.  ...  The largest ghost region has 3 × (256/4 − 2 × 3) 2 = 10,092 cells.Listing 1. Type definition for a float array application object.  ... 
doi:10.1145/3173551 fatcat:mlumvyz7xbfphahbrl53mx7bri

Zippy: A Framework for Computation and Visualization on a GPU Cluster

Zhe Fan, Feng Qiu, Arie E. Kaufman
2008 Computer graphics forum (Print)  
It abstracts the GPU cluster programming with a two-level parallelism hierarchy and a non-uniform memory access (NUMA) model.  ...  They demonstrate that Zippy can ease the development and integration of parallel visualization, graphics, and computation modules on a GPU cluster.  ...  Acknowledgements We wish to thank Jarek Nieplocha and Manojkumar Krishnan for discussing GA and Mike Houston for discussing parallel volume rendering. This work is supported by NSF grant CCF-0702699.  ... 
doi:10.1111/j.1467-8659.2008.01131.x fatcat:bchxru3j6ffdli36u57rjirhsm

Simulation of shallow-water systems using graphics processing units

Miguel Lastra, José M. Mantas, Carlos Ureña, Manuel J. Castro, José A. García-Rodríguez
2009 Mathematics and Computers in Simulation  
The potential data parallelism of this method is identified and the scheme is efficiently implemented on GPUs for one-layer shallow-water systems.  ...  Numerical experiments performed on several GPUs show the high efficiency of the GPU solver in comparison with a highly optimized implementation of a CPU solver.  ...  Lastra and C. Ureña also acknowledge partial support from DGI-MEC project TIN2004-07672-c03-02. M. Castro acknowledges partial support from DGI-MEC project MTM2006-08075.  ... 
doi:10.1016/j.matcom.2009.09.012 fatcat:aqy4mkk63jg35fn4ehkqu5ith4

Parallel Languages and Compilers: Perspective From the Titanium Experience

K. Yelick, P. Hilfinger, S. Graham, D. Bonachea, J. Su, A. Kamil, K. Datta, P. Colella, T. Wen
2007 The international journal of high performance computing applications  
types that are value types rather than reference types), operator overloading, and generic programming.  ...  We summarize results and lessons learned from implementing the NAS parallel benchmarks, elliptic and hyperbolic solvers using Adaptive Mesh Refinement, and several applications of the Immersed Boundary  ...  less abstraction and productivity features.  ... 
doi:10.1177/1094342007078449 fatcat:y52hkslgw5fbtjiyb3viwy3quq
« Previous Showing results 1 — 15 out of 2,403 results