Filters








135,537 Hits in 2.9 sec

ReduxSTM: Optimizing STM designs for Irregular Applications

Manuel Pedrero, Eladio Gutierrez, Sergio Romero, Oscar Plata
2017 Journal of Parallel and Distributed Computing  
TM features can be leveraged to provide support for speculative parallel execution of irregular applications, characterized by a lack of knowledge about data dependences at compile-time.  ...  With this aim, ReduxSTM is introduced as a specific STM system designed by combining techniques for speculative execution with TM algorithms.  ...  This paper focuses on the challenge of parallelizing irregular applications using the speculative support provided by TM.  ... 
doi:10.1016/j.jpdc.2017.04.009 fatcat:4qlxt7blobb5jocj4e2p4mlpxi

Design and Implementation of a Parallel I/O Runtime System for Irregular Applications

Jaechun No, Sung-soon Park, Jesus Carretero Perez, Alok Choudhary
2002 Journal of Parallel and Distributed Computing  
In this paper we present the design, implementation and evaluation of a runtime system based on collective I/O techniques for irregular applications.  ...  We demonstrate that we can obtain significantly highperformance for I/O above what has been possible so far.  ...  Introduction Parallel computers are being used increasingly to solve large irregular applications with huge I/O requirements [5] .  ... 
doi:10.1006/jpdc.2001.1788 fatcat:tw4ez33oljgwnne6a52szohx2e

A Shared Memory Parallel Block Streaming Model for Irregular Applications

Anup Zope, Edward Luke
2019 International Journal of Networking and Computing  
Due to worsening machine balance, a lightweight irregular application can utilize only a small fraction of the peak computational capacity on modern processors.  ...  Further, we experimentally demonstrate usefulness of the model and the transformations for static lightweight irregular computations such as those performed by a numerical partial differential equation  ...  Therefore, these applications are also static in addition to being lightweight and irregular.  ... 
doi:10.15803/ijnc.9.1_70 fatcat:gjpqhacngnfblg3zdeajnx424m

M-Tree: A parallel abstract data type for block-irregular adaptive applications [chapter]

Q. Wu, A. J. Field, P. H. J. Kelly
1997 Lecture Notes in Computer Science  
We would also like to extend our thanks to the Imperial/Fujitsu Parallel Computing Research Centre for their continued support.  ...  Using data abstraction to capture commonly-occurring computation forms with a class of application have been well studied, in particular for irregular and dynamic applications.  ...  Introduction In this paper we present an abstract data type called \M-Tree", a hierarchical data structure which is used for organizing block-irregular computations generated by recursive domain decomposition  ... 
doi:10.1007/bfb0002795 fatcat:cy7ufn6cgbf53ec6a5ddbxxq2q

On the Performance of Parallel Tasking Runtimes for an Irregular Fast Multipole Method Application [chapter]

Patrick Atkinson, Simon McIntosh-Smith
2017 Lecture Notes in Computer Science  
This paper will present our work on optimising and comparing the performance of an irregular algorithm for the increasingly important fast multipole method with the use of tasks.  ...  We also compare the performance of the chosen application between different OpenMP implementations and to other task-parallel programming models, finding that significant performance differences can be  ...  Acknowledgements The authors would like the thank EPSRC for funding this work, as well as Bristol's Intel Parallel Computing Centre (IPCC) for access to the KNL platform.  ... 
doi:10.1007/978-3-319-65578-9_7 fatcat:ik4jitrvz5adlgav5mh5brouha

Automated Bug Detection for High-level Synthesis of Multi-threaded Irregular Applications

Pietro Fezzardi, Fabrizio Ferrandi
2020 ACM Transactions on Parallel Computing  
Things are even worse if the original specification is a multi-threaded program and the HLS tool is trying to generate a parallel architecture for an irregular application.  ...  This is one of the original contributions of this work, and it is the fundamental building block used in Section 6 to add support for debugging circuits generated from multi-threaded irregular applications  ...  For this reason, they are increasingly used in datacenters [40] , High Performance Computing, and irregular applications [32] .  ... 
doi:10.1145/3418086 fatcat:ixgetytmtjai7env263rt2kmy4

Design of a Method-Level Speculation framework for boosting irregular JVM applications

Ivo Anjo, João Cachopo
2016 Journal of Parallel and Distributed Computing  
h i g h l i g h t s • JaSPEx-MLS: an automatic parallelization framework for JVM applications. • Uses Method-Level Speculation. • Custom STM extended with support for futures, value prediction, and captured  ...  To tackle this issue, we have developed JaSPEx-MLS: a software-based automatic parallelization framework targeted at sequential irregular Java/JVM applications, that is based on Method-Level Speculation  ...  All rights reserved. out most commodity multicores and many dynamic and irregular applications.  ... 
doi:10.1016/j.jpdc.2015.09.005 fatcat:65fbaq3abbdnngbaoazlpwry2q

Efficient Run-Time Support for Irregular Block-Structured Applications

Stephen J. Fink, Scott B. Baden, Scott R. Kohn
1998 Journal of Parallel and Distributed Computing  
Kernel Lattice Parallelism • KeLP: Kernel Lattice Parallelism. -Library for higher level abstractions for managing data layout and data motion.  ...  -Applications with dynamic block structures: uniform rectangular data arrays with irregular data motion.  ...  . • Inspector/executor appears in Multiblock PARTI, which does not allow irregular block decompositions-doesn't have same level of structural abstraction. • Number of other related applications.  ... 
doi:10.1006/jpdc.1998.1437 fatcat:nghso34ejzaybaps6cnj7i3ocq

CUIRRE: An open-source library for load balancing and characterizing irregular applications on GPUs

Tao Zhang, Wei Shu, Min-You Wu
2014 Journal of Parallel and Distributed Computing  
In addition, CUIRRE can characterize irregular applications for their irregularity, thread granularity and GPU utilization.  ...  In this paper, we introduce a new library, CUIRRE, for improving performance of irregular applications on GPUs.  ...  Acknowledgment The authors would like to thank Linghe Kong, Xiaoyang Liu, Manuel Charlemagne and anonymous reviewers for their fruitful feedback and comments that have helped them improve the quality of  ... 
doi:10.1016/j.jpdc.2014.07.004 fatcat:x547teh6hne6poqr5vclwc5tqq

ABC2: Adaptively Balancing Computation and Communication in a DSM Cluster of Multicores for Irregular Applications

Sai Charan Koduru, Keval Vora, Rajiv Gupta
2014 2014 IEEE International Parallel & Distributed Processing Symposium Workshops  
Our analysis of several graph applications that rely on speculative parallelism or asynchronous parallelism shows that the balance between computation and communication differs between applications.  ...  We observe that the best configuration for above mechanisms varies across different inputs in addition to the variation across different applications.  ...  This work focusses on irregular applications; for speculative applications, we rely on the SpiceC [4] model, which is a lazy release, multiple-writer protocol.  ... 
doi:10.1109/ipdpsw.2014.51 dblp:conf/ipps/KoduruVG14 fatcat:pvnngtdd65aila7i5txfb6n6ca

On the Influence of Thread Allocation for Irregular Codes in NUMA Systems

Juan A. Lorenzo, Francisco F. Rivera, Peter Tuma, Juan C. Pichel
2009 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies  
Results show that: (1) cores which share a socket can be considered as independent processors in this context; (2) for big data sizes, the effect of sharing a bus degrades the final performance but masks  ...  The main objective was to determine the performance effect of bus contention and cache coherency as well as the suitability of porting strategies regarding irregular codes in such a complex architecture  ...  In a recent work, a framework for automatic detection and application of the best mapping among threads and cores in parallel applications on multi-core systems was presented [5] .  ... 
doi:10.1109/pdcat.2009.42 dblp:conf/pdcat/LorenzoRTP09 fatcat:zpolalcvvvd6ddir6n6wx3mlvu

Executing Optimized Irregular Applications Using Task Graphs within Existing Parallel Models

Christopher D. Krieger, Michelle Mills Strout, Jonathan Roelofs, Amanreet Bajwa
2012 2012 SC Companion: High Performance Computing, Networking Storage and Analysis  
Many sparse or irregular scientific computations are memory bound and benefit from locality improving optimizations such as blocking or tiling.  ...  We present performance and scalability results for 8 and 40 core shared memory systems on a sparse matrix iterative solver and a molecular dynamics benchmark.  ...  We also thank Samantha Wood of the University of California, San Diego for her contributions toward the algorithms used in this paper's evaluations.  ... 
doi:10.1109/sc.companion.2012.43 dblp:conf/sc/KriegerSRB12 fatcat:6yfyqol2knajpix5fj5k4pegoi

Parallel Depth-First Search for Directed Acyclic Graphs

Maxim Naumov, Alysson Vrielink, Michael Garland
2017 Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms - IA3'17  
Motivated by this, we propose a framework and corresponding algorithms for both edge insertion and deletion in general directed graphs.  ...  It is the basis of many graph algorithms such as computing strongly connected components, testing planarity, and detecting biconnected components. The result of a DFS is normally shown as a DFS-Tree.  ...  In this paper, we examine this problem, which is to update the DFS-Tree for an inserted or deleted edge. The aforementioned applications of DFS benefit from this study.  ... 
doi:10.1145/3149704.3149764 dblp:conf/sc/NaumovVG17 fatcat:6cf2usa3yzaj3el7ixi546lvoe

Fast Parallel Cosine K-Nearest Neighbor Graph Construction

David C. Anastasiu, George Karypis
2016 2016 6th Workshop on Irregular Applications: Architecture and Algorithms (IA3)  
Constructing the graph requires computing up to n 2 similarities for a set of n objects.  ...  In contrast, we leverage shared memory parallelism and recent advances in computing similarity joins to solve the problem exactly, via a filtering based approach.  ...  For full details on the filtering process, see [1] . In our parallel method, pL2Knng, threads concurrently process different query objects.  ... 
doi:10.1109/ia3.2016.013 dblp:conf/sc/AnastasiuK16 fatcat:ksvko6ybznf3tkwgeuecc7vyrq

Parallel simulation of dendritic growth on unstructured grids

Andreas Schäfer, Julian Hammer, Dietmar Fey
2011 Proceedings of the first workshop on Irregular applications: architectures and algorithm - IAAA '11  
doi:10.1145/2089142.2089148 dblp:conf/sc/SchaferHF11 fatcat:htzcc6blj5ejza2o42pb43neny
« Previous Showing results 1 — 15 out of 135,537 results