Filters








49,282 Hits in 2.5 sec

Simplified parallel domain traversal

Wesley Kendall, Jingyuan Wang, Melissa Allen, Tom Peterka, Jian Huang, David Erickson
2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11  
Inspired by MapReduce and other simplified parallel programming approaches, we have designed DStep, a flexible system that greatly simplifies efficient parallelization of domain traversal techniques at  ...  Many data-intensive scientific analysis techniques require global domain traversal, which over the years has been a bottleneck for efficient parallelization across distributedmemory architectures.  ...  We then cover related work in parallel flow tracing, which is our defining problem for domain traversal. We also review existing approaches in simplified large-data processing.  ... 
doi:10.1145/2063384.2063397 dblp:conf/sc/KendallWAPHE11 fatcat:l237y64r65bvnmlc47yf7rnsou

A Parallel Adaptive Cartesian PDE Solver Using Space–Filling Curves [chapter]

Hans-Joachim Bungartz, Miriam Mehl, Tobias Weinzierl
2006 Lecture Notes in Computer Science  
The implementation and parallel extension, using a spacefilling curve to obtain a load balanced domain decomposition, will be formalised.  ...  In this paper, we present a parallel multigrid PDE solver working on adaptive hierarchical cartesian grids.  ...  Grid Traversal Using a Peano Curve Space-filling curves [15] are well known to simplify a lot of different tasks, due to their good locality properties ([3, 5-8, 10, 12, 13] e.g.).  ... 
doi:10.1007/11823285_112 fatcat:ckd2msjirbbzlm3amw7hrg7tdy

GUIDE: Parallel library-centric application design by a generic scientific simulation environment

René Heinzl, Philipp Schwaha, Franz Stimpfl, Siegfried Selberherr
2009 International Journal of Parallel, Emergent and Distributed Systems  
A parallel generic scientific simulation environment has been developed to ease this transition from single-core to multi-core systems without additional development activity.  ...  The domain-specific part of a DSL can then be employed by a simplified grammar and an increased expressiveness.  ...  applications suitable for multi-core processors by parallel components, thereby simplifying development, scalability, stabilisation, further support and parallelisation.  ... 
doi:10.1080/17445760902758545 fatcat:k2vwcnbvkjcivguvsb7fopftci

Triolet

Christopher Rodrigues, Thomas Jablin, Abdul Dakkak, Wen-Mei Hwu
2014 Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14  
We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23-100% of its performance on a 128-core cluster.  ...  We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell  ...  We enhance the functionality of indexers so that the library can partition arrays that are traversed in parallel.  ... 
doi:10.1145/2555243.2555268 dblp:conf/ppopp/RodriguesJDH14 fatcat:zhwz23c2pbfz3acjzoassqjuju

Triolet

Christopher Rodrigues, Thomas Jablin, Abdul Dakkak, Wen-Mei Hwu
2014 SIGPLAN notices  
We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23-100% of its performance on a 128-core cluster.  ...  We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell  ...  We enhance the functionality of indexers so that the library can partition arrays that are traversed in parallel.  ... 
doi:10.1145/2692916.2555268 fatcat:dinspgk56rc55c6cfe2x6pptcu

Parallel Computation of 2D Morse-Smale Complexes

Nithin Shivashankar, Senthilnathan M, Vijay Natarajan
2012 IEEE Transactions on Visualization and Computer Graphics  
Second, we describe a two-step graph traversal algorithm to compute the 1-saddle-2-saddle connections efficiently and in parallel on the CPU.  ...  Simultaneously, the extremasaddle connections are computed using a tree traversal algorithm on the GPU.  ...  Thus the number of traversals that can be launched in parallel is limited by available memory.  ... 
doi:10.1109/tvcg.2011.284 pmid:22156106 fatcat:nr25f6zkzrarllwja7cicyawmi

Parallel Computation of 3D Morse-Smale Complexes

Nithin Shivashankar, Vijay Natarajan
2012 Computer graphics forum (Print)  
Second, we describe a two-step graph traversal algorithm to compute the 1-saddle-2-saddle connections efficiently and in parallel on the CPU.  ...  Simultaneously, the extremasaddle connections are computed using a tree traversal algorithm on the GPU.  ...  Thus the number of traversals that can be launched in parallel is limited by available memory.  ... 
doi:10.1111/j.1467-8659.2012.03089.x fatcat:d7vsjdbdfjd5bfe4lnxt4kw7ka

A Parallel Approach to Symbolic Traversal based on Set Partitioning [chapter]

G. Cabodi, P. Camurati, A. Lioy, M. Poncino, S. Quer
1997 IFIP Advances in Information and Communication Technology  
implementation of the algorithms, as parallel architectures represent a natural environment to overcome these limitations.  ...  Partitioning techniques and granularity of parallel tasks are discussed as a major issue to obtain a viable and efficient solution. Experimental results show the feasibility of the approach.  ...  -Ttot(PPArt) -a, Table 1 . 1 Traversal Results on some ISCAS'89 and ISCAS'89-addendum circuits. • indicates that we use a simplified version of the original circuit Column Circuit gives the name of the  ... 
doi:10.1007/978-0-387-35190-2_11 fatcat:x36dclnderaq7iemo7p6fim27a

Tapas: An Implicitly Parallel Programming Framework for Hierarchical N-Body Algorithms

Keisuke Fukuda, Motohiko Matsuda, Naoya Maruyama, Rio Yokota, Kenjiro Taura, Satoshi Matsuoka
2016 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS)  
their variants are widely used in scientific applications, their correct implementations are often difficult on such modern machines, as the algorithms are irregular, complex, and involve explicit task parallel  ...  Tapas solves this by converting the users clean implicit-style parallel program into an inspector-executor style code on heterogeneous multi-core, multi-node environment solely by the use of C++ template  ...  In particular, domain specific languages have been shown to be effective by greatly simplifying the task of performance critical programming such as efficient parallelization and architecture-specific  ... 
doi:10.1109/icpads.2016.0145 dblp:conf/icpads/FukudaMMYTM16 fatcat:fiuddf2ce5b4dnd7x7nbyvl4ny

GPU accelerated Hybrid Tree Algorithm for Collision-less N-body Simulations [article]

Tsuyoshi Watanabe, Naohito Nakasato
2014 arXiv   pre-print
For hard-force calculation, we can efficiently reduce the calculation and communication cost of the parallel tree code because we only need data of neighbor particles for this part.  ...  To simplify communication for hardforce calculation, a shape of a region of a process should be a cuboid. As the method of cuboid domain decomposition, we use the method introduced in [5] .  ...  However, the detailed implementation details of their tree traversal kernels and domain decomposition are different.  ... 
arXiv:1406.6158v1 fatcat:hk4rbeqouregvbxvzoykqvllim

Teuta: Tool Support for Performance Modeling of Distributed and Parallel Applications [chapter]

Thomas Fahringer, Sabri Pllana, Johannes Testori
2004 Lecture Notes in Computer Science  
In addition, Teuta supports semantic model checking for the domain of high performance computing. For the generation of different model representations the model traversing is used.  ...  In this paper we describe Teuta, which we have developed to provide tool support for the UML-based performance modeling of distributed and parallel applications.  ...  MCL File UML Rule Set HPC Domain Rule Set Fig. 3.  ... 
doi:10.1007/978-3-540-24688-6_60 fatcat:4gc2v2lk2fee5fyehudlrfvibi

A Parallel Architecture for IISPH Fluids [article]

Felix Thaler, Barbara Solenthaler, Markus Gross
2014 Workshop on Virtual Reality Interactions and Physical Simulations  
We use orthogonal recursive bisection for domain decomposition and present a stable and fast converging load balancing controller.  ...  Simultaneous communication and computation are used to minimize parallelization overhead.  ...  To leverage the parallelism of cluster computers, ORB was used for domain decomposition.  ... 
doi:10.2312/vriphys.20141230 dblp:conf/vriphys/ThalerSG14 fatcat:4maj7ytiwvaqjfxo5fbmgv4rva

Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies

Bastian Hagedorn, Johannes Lenfers, Thomas Kœhler, Xueying Qin, Sergei Gorlatch, Michel Steuwer
2020 Proceedings of the ACM on Programming Languages (PACMPL)  
Many emerging DSLs used in performance demanding domains such as deep learning or high-performance image processing attempt to simplify or even fully automate the optimization process.  ...  Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for many applications.  ...  This structure simplifies the traversal and implementation of more complex optimization strategies, as we will see in the following section. Confluence and Termination.  ... 
doi:10.1145/3408974 fatcat:f72dfpyvpja63nauihpuknf3ue

Combining spatial components and Hilbert transforms to interpret ground-time-domain electromagnetic data

Jacques K. Desmarais, Richard S. Smith
2015 Geophysics  
We have developed a method for displaying or imaging data from a ground-time-domain electromagnetic system and for extracting the geometric parameters of a small conductor.  ...  The FFT is computed from the space domain to the wavenumber domain. The resulting Fourier domain signalH is subsequently converted to a causal sequence.  ...  The response shape can be simplified using the EE as defined in equation 1.  ... 
doi:10.1190/geo2014-0528.1 fatcat:imy2lxsssfdl5kq5dr5pljrrvu

Distributed merge trees

Dmitriy Morozov, Gunther Weber
2013 Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13  
As the growth of serial computational power has stalled, data analysis is becoming increasingly dependent on massively parallel machines.  ...  We develop a distributed representation of the merge tree that avoids computing the global tree on a single processor and lets us parallelize subsequent queries.  ...  Therefore, in a single traversal, it can identify both the persistent components and how much volume its local domain contributes to each one.  ... 
doi:10.1145/2442516.2442526 dblp:conf/ppopp/MorozovW13 fatcat:pmf535ncz5eqvcztssbcrtejvq
« Previous Showing results 1 — 15 out of 49,282 results