A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2012; you can also visit the original URL.
The file type is application/pdf
.
Filters
Simplified parallel domain traversal
2011
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11
Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at ...
Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributedmemory architectures. ...
We then cover related work in parallel flow tracing, which is our defining problem for domain traversal. We also review existing approaches in simplified large-data processing. ...
doi:10.1145/2063384.2063397
dblp:conf/sc/KendallWAPHE11
fatcat:l237y64r65bvnmlc47yf7rnsou
A Parallel Adaptive Cartesian PDE Solver Using Space–Filling Curves
[chapter]
2006
Lecture Notes in Computer Science
The implementation and parallel extension, using a spacefilling curve to obtain a load balanced domain decomposition, will be formalised. ...
In this paper, we present a parallel multigrid PDE solver working on adaptive hierarchical cartesian grids. ...
Grid Traversal Using a Peano Curve Space-filling curves [15] are well known to simplify a lot of different tasks, due to their good locality properties ([3, 5-8, 10, 12, 13] e.g.). ...
doi:10.1007/11823285_112
fatcat:ckd2msjirbbzlm3amw7hrg7tdy
GUIDE: Parallel library-centric application design by a generic scientific simulation environment
2009
International Journal of Parallel, Emergent and Distributed Systems
A parallel generic scientific simulation environment has been developed to ease this transition from single-core to multi-core systems without additional development activity. ...
The domain-specific part of a DSL can then be employed by a simplified grammar and an increased expressiveness. ...
applications suitable for multi-core processors by parallel components, thereby simplifying development, scalability, stabilisation, further support and parallelisation. ...
doi:10.1080/17445760902758545
fatcat:k2vwcnbvkjcivguvsb7fopftci
Triolet
2014
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14
We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23-100% of its performance on a 128-core cluster. ...
We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell ...
We enhance the functionality of indexers so that the library can partition arrays that are traversed in parallel. ...
doi:10.1145/2555243.2555268
dblp:conf/ppopp/RodriguesJDH14
fatcat:zhwz23c2pbfz3acjzoassqjuju
Triolet
2014
SIGPLAN notices
We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23-100% of its performance on a 128-core cluster. ...
We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell ...
We enhance the functionality of indexers so that the library can partition arrays that are traversed in parallel. ...
doi:10.1145/2692916.2555268
fatcat:dinspgk56rc55c6cfe2x6pptcu
Parallel Computation of 2D Morse-Smale Complexes
2012
IEEE Transactions on Visualization and Computer Graphics
Second, we describe a two-step graph traversal algorithm to compute the 1-saddle-2-saddle connections efficiently and in parallel on the CPU. ...
Simultaneously, the extremasaddle connections are computed using a tree traversal algorithm on the GPU. ...
Thus the number of traversals that can be launched in parallel is limited by available memory. ...
doi:10.1109/tvcg.2011.284
pmid:22156106
fatcat:nr25f6zkzrarllwja7cicyawmi
Parallel Computation of 3D Morse-Smale Complexes
2012
Computer graphics forum (Print)
Second, we describe a two-step graph traversal algorithm to compute the 1-saddle-2-saddle connections efficiently and in parallel on the CPU. ...
Simultaneously, the extremasaddle connections are computed using a tree traversal algorithm on the GPU. ...
Thus the number of traversals that can be launched in parallel is limited by available memory. ...
doi:10.1111/j.1467-8659.2012.03089.x
fatcat:d7vsjdbdfjd5bfe4lnxt4kw7ka
A Parallel Approach to Symbolic Traversal based on Set Partitioning
[chapter]
1997
IFIP Advances in Information and Communication Technology
implementation of the algorithms, as parallel architectures represent a natural environment to overcome these limitations. ...
Partitioning techniques and granularity of parallel tasks are discussed as a major issue to obtain a viable and efficient solution. Experimental results show the feasibility of the approach. ...
-Ttot(PPArt) -a,
Table 1 . 1 Traversal Results on some ISCAS'89 and ISCAS'89-addendum circuits. • indicates that we use a simplified version of the original circuit Column Circuit gives the name of the ...
doi:10.1007/978-0-387-35190-2_11
fatcat:x36dclnderaq7iemo7p6fim27a
Tapas: An Implicitly Parallel Programming Framework for Hierarchical N-Body Algorithms
2016
2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)
their variants are widely used in scientific applications, their correct implementations are often difficult on such modern machines, as the algorithms are irregular, complex, and involve explicit task parallel ...
Tapas solves this by converting the users clean implicit-style parallel program into an inspector-executor style code on heterogeneous multi-core, multi-node environment solely by the use of C++ template ...
In particular, domain specific languages have been shown to be effective by greatly simplifying the task of performance critical programming such as efficient parallelization and architecture-specific ...
doi:10.1109/icpads.2016.0145
dblp:conf/icpads/FukudaMMYTM16
fatcat:fiuddf2ce5b4dnd7x7nbyvl4ny
GPU accelerated Hybrid Tree Algorithm for Collision-less N-body Simulations
[article]
2014
arXiv
pre-print
For hard-force calculation, we can efficiently reduce the calculation and communication cost of the parallel tree code because we only need data of neighbor particles for this part. ...
To simplify communication for hardforce calculation, a shape of a region of a process should be a cuboid. As the method of cuboid domain decomposition, we use the method introduced in [5] . ...
However, the detailed implementation details of their tree traversal kernels and domain decomposition are different. ...
arXiv:1406.6158v1
fatcat:hk4rbeqouregvbxvzoykqvllim
Teuta: Tool Support for Performance Modeling of Distributed and Parallel Applications
[chapter]
2004
Lecture Notes in Computer Science
In addition, Teuta supports semantic model checking for the domain of high performance computing. For the generation of different model representations the model traversing is used. ...
In this paper we describe Teuta, which we have developed to provide tool support for the UML-based performance modeling of distributed and parallel applications. ...
MCL File
UML Rule Set
HPC Domain Rule Set
Fig. 3. ...
doi:10.1007/978-3-540-24688-6_60
fatcat:4gc2v2lk2fee5fyehudlrfvibi
A Parallel Architecture for IISPH Fluids
[article]
2014
Workshop on Virtual Reality Interactions and Physical Simulations
We use orthogonal recursive bisection for domain decomposition and present a stable and fast converging load balancing controller. ...
Simultaneous communication and computation are used to minimize parallelization overhead. ...
To leverage the parallelism of cluster computers, ORB was used for domain decomposition. ...
doi:10.2312/vriphys.20141230
dblp:conf/vriphys/ThalerSG14
fatcat:4maj7ytiwvaqjfxo5fbmgv4rva
Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies
2020
Proceedings of the ACM on Programming Languages (PACMPL)
Many emerging DSLs used in performance demanding domains such as deep learning or high-performance image processing attempt to simplify or even fully automate the optimization process. ...
Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for many applications. ...
This structure simplifies the traversal and implementation of more complex optimization strategies, as we will see in the following section. Confluence and Termination. ...
doi:10.1145/3408974
fatcat:f72dfpyvpja63nauihpuknf3ue
Combining spatial components and Hilbert transforms to interpret ground-time-domain electromagnetic data
2015
Geophysics
We have developed a method for displaying or imaging data from a ground-time-domain electromagnetic system and for extracting the geometric parameters of a small conductor. ...
The FFT is computed from the space domain to the wavenumber domain. The resulting Fourier domain signalH is subsequently converted to a causal sequence. ...
The response shape can be simplified using the EE as defined in equation 1. ...
doi:10.1190/geo2014-0528.1
fatcat:imy2lxsssfdl5kq5dr5pljrrvu
Distributed merge trees
2013
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13
As the growth of serial computational power has stalled, data analysis is becoming increasingly dependent on massively parallel machines. ...
We develop a distributed representation of the merge tree that avoids computing the global tree on a single processor and lets us parallelize subsequent queries. ...
Therefore, in a single traversal, it can identify both the persistent components and how much volume its local domain contributes to each one. ...
doi:10.1145/2442516.2442526
dblp:conf/ppopp/MorozovW13
fatcat:pmf535ncz5eqvcztssbcrtejvq
« Previous
Showing results 1 — 15 out of 49,282 results