Filters








152,979 Hits in 3.4 sec

Synchronization transformations for parallel computing

Pedro C. Diniz, Martin C. Rinard
1999 Concurrency Practice and Experience  
As parallel machines become part of the mainstream computing environment, compilers will need to apply synchronization optimizations to deliver e cient parallel software.  ...  This paper describes a new framework for synchronization optimizations and a new set of transformations for programs that implement critical sections using mutual exclusion locks.  ...  The tasks in ne-grain parallel computations, for example, need fast synchronization for e cient control of their frequent interactions.  ... 
doi:10.1002/(sici)1096-9128(199911)11:13<773::aid-cpe453>3.0.co;2-5 fatcat:kdm3brli5ngdlj4k36bpinomhu

Synchronization transformations for parallel computing

Pedro Diniz, Martin Rinard
1997 Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '97  
As parallel machines become part of the mainstream computing environment, compilers will need to apply synchronization optimizations to deliver e cient parallel software.  ...  This paper describes a new framework for synchronization optimizations and a new set of transformations for programs that implement critical sections using mutual exclusion locks.  ...  The tasks in ne-grain parallel computations, for example, need fast synchronization for e cient control of their frequent interactions.  ... 
doi:10.1145/263699.263718 dblp:conf/popl/RinardD97 fatcat:awb5apjb2zcczcktubr524qm5i

Parallel Execution of ATL Transformation Rules [chapter]

Massimo Tisi, Salvador Martínez, Hassene Choura
2013 Lecture Notes in Computer Science  
While parallelization is one of the traditional ways of making computation systems scalable, developing parallel model transformations in a general-purpose language is a complex and error-prone task.  ...  We describe the implementation of a parallel transformation engine for the current version of the ATL language and experimentally evaluate the consequent gain in scalability.  ...  For this reason we look to a more coarse-grained decomposition for the transformation computation.  ... 
doi:10.1007/978-3-642-41533-3_40 fatcat:axz3kt5go5eunorqjk5hvxfari

Dancing with uncertainty

Sasa Misailovic, Stelios Sidiroglou, Martin C. Rinard
2012 Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability - RACES '12  
This set of transformations generates a space of alternative, possibly non-deterministic, parallel programs with varying performance and accuracy characteristics.  ...  relaxing synchronization primitives.  ...  Dubstep thus narrows its optimization focus to these parallel sections. Since these computations contain synchronization barriers, they are a good target for the opportunistic barrier transformation.  ... 
doi:10.1145/2414729.2414738 dblp:conf/oopsla/MisailovicSR12 fatcat:bdkib2hjsvdijeessogbkv53zq

Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-Based Programs

Pedro C. Diniz, Martin C. Rinard
1998 Journal of Parallel and Distributed Computing  
Atomic operations are a key primitive in parallel computing systems. The standard implementation mechanism for atomic operations uses mutual exclusion locks.  ...  We h a ve implemented this technique in the context of a parallelizing compiler for irregular, object-based programs.  ...  We h a ve implemented these algorithms and integrated them into a parallelizing compiler for object-based languages.  ... 
doi:10.1006/jpdc.1998.1441 fatcat:xyd557ugkzb5vj7prwj74vi2vq

Lock coarsening: Eliminating lock overhead in automatically parallelized object-based programs [chapter]

Pedro Diniz, Martin Rinard
1997 Lecture Notes in Computer Science  
Atomic operations are a key primitive in parallel computing systems. The standard implementation mechanism for atomic operations uses mutual exclusion locks.  ...  We h a ve implemented this technique in the context of a parallelizing compiler for irregular, object-based programs.  ...  We h a ve implemented these algorithms and integrated them into a parallelizing compiler for object-based languages.  ... 
doi:10.1007/bfb0017259 fatcat:g7g6gkoyvzcchejky7rr3jjbje

Eliminating synchronization bottlenecks using adaptive replication

Martin C. Rinard, Pedro C. Diniz
2003 ACM Transactions on Programming Languages and Systems  
In addition to automatic parallelization and adaptive replication, our compiler also implements a lock coarsening transformation that increases the granularity at which the computation locks objects.  ...  We have implemented adaptive replication in the context of a parallelizing compiler for a subset of C++.  ...  ACKNOWLEDGMENTS We would like to the anonymous referees of various versions of this article for their thoughtful and helpful comments.  ... 
doi:10.1145/641909.641911 fatcat:6ftcwn2lbbc3vhv2qb7spqujfm

Auto-FCD: efficiently parallelizing CFD applications on clusters

Li Xiao, Xiadong Zhang, Zhengqian Kuang, Baiming Feng, Jichang Kang
2003 Proceedings IEEE International Conference on Cluster Computing CLUSTR-03  
Computational Fluid Dynamics (CFD) applications are highly demanding for parallel computing. Many such applications have been shifted from expensive MPP boxes to cost-effective clusters.  ...  Auto-CFD is a pre-compiler which transforms Fortran CFD sequential programs to efficient message-passing parallel programs running on clusters. Our work has the following three unique contributions.  ...  The grid is then transformed into a computational grid in a regular shape, such as a rectangular grid.  ... 
doi:10.1109/clustr.2003.1253298 dblp:conf/cluster/XiaoZKFK03 fatcat:rgm5u7kezbdwfeudgqrd45ykyy

Parallelizing Sequential Programs with Statistical Accuracy Tests

Sasa Misailovic, Deokhwan Kim, Martin Rinard
2013 ACM Transactions on Embedded Computing Systems  
We present QuickStep, a novel system for parallelizing sequential programs.  ...  Unlike standard parallelizing compilers (which are designed to preserve the semantics of the original sequential computation), QuickStep is instead designed to generate (potentially nondeterministic) parallel  ...  ACKNOWLEDGMENTS We would like to thank Dan Roy for his help with the statistical accuracy test and Stelios Sidiroglou and Danny Dig for their useful comments on the earlier drafts of this work.  ... 
doi:10.1145/2465787.2465790 fatcat:n5sq2veixnfu5e5d5lcqhds7xq

How do programs become more concurrent

Danny Dig, John Marrero, Michael D. Ernst
2011 Proceeding of the 4th international workshop on Multicore software engineering - IWMSE '11  
Our findings educate software developers on how to parallelize sequential programs, and provide hints for tool vendors about what transformations are worth automating.  ...  In the multi-core era, programmers need to resort to parallelism if they want to improve program performance. Thus, a major maintenance task will be to make sequential programs more concurrent.  ...  The authors thank Adam Kiezun, Stephen McCamant, Angeline Lee, Derek Rayside, and anonymous reviewers for providing helpful suggestions. Danny thanks Monika Dig, his greatest supporter.  ... 
doi:10.1145/1984693.1984700 fatcat:3sn3zh4befagtkiaa47t4up7m4

Coarse-Grained Loop Parallelization: Iteration Space Slicing vs Affine Transformations

Anna Beletska, Wlodzimierz Bielecki, Albert Cohen, Marek Palkowski, Krzysztof Siedlecki
2009 2009 Eighth International Symposium on Parallel and Distributed Computing  
Automatic coarse-grained parallelization of program loops is of great importance for multi-core computing systems.  ...  We demonstrate that Iteration Space Slicing permits for extracting more coarse-grained parallelism in comparison to the Affine Transformation Framework.  ...  Affine transformations permit for the extraction of coarse-grained parallelism represented with synchronization-free threads.  ... 
doi:10.1109/ispdc.2009.15 dblp:conf/ispdc/BeletskaBCPS09 fatcat:ge2ju2saq5hpjf2cyyt2xfdyw4

A SHARED MEMORY BASED IMPLEMENTATION OF NEEDLEMAN-WUNSCH ALGORITHM USING SKEWING TRANSFORMATION

Vibha Patel
2017 International Journal of Advanced Research in Computer Science  
We present two parallel approaches of Needleman-Wunsch algorithm with single kernel and multi-kernel invocation using skewing transformation which is used for traversing and calculation of dynamic programming  ...  Among various algorithms for protein and nucleotide alignment, Needleman-Wunsch algorithm is widely accepted as it can divide the problem into sub-problems.  ...  After that skewing transformation is applied so that computation can be done in parallel. The computation results are then copied back to original dynamic programming matrix.  ... 
doi:10.26483/ijarcs.v8i9.4953 fatcat:wl5sn2g25ffnbfjwpvcfoffguq

Maximizing parallelism and minimizing synchronization with affine transforms

Amy W. Lim, Monica S. Lam
1997 Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '97  
This paper presents an algorithm to find the optimal affine partitions that maximize the degree of parallelism and minimize the degree of synchronization in programs with arbitrary loop nestings and affine  ...  The algorithm presented subsumes previously proposed loop transformation algorithms that are based on unimodular transformations, loop distribution, fusion, scaling, reindexing, and statement reordering  ...  For the different parallelization schemes, each thick line represents a barrier synchronization, and each gray box groups together computations that are assigned to the same processor.  ... 
doi:10.1145/263699.263719 dblp:conf/popl/LimL97 fatcat:etohvz56xnb6fmsdfbw4prvyuu

Heterogeneous Model Merging Based on Model Transformation

Hongtian Ma, Hehua Zhang, Ming Gu
2016 International Journal of Modeling and Optimization  
In this paper we propose a series of rules and mechanisms on model transformation from SyncBlock to the SR model of computation in Ptolemy II for heterogeneous model merging.  ...  In our previous work we proposed a system level design language named SyncBlock and developed a toolset for the design of synchronous embedded system.  ...  On the other hand, Ptolemy II [2] , [3] defines many models of computation like Synchronous Reactive (SR), Discrete Event (DE), and synchronous dataflow (SDF) etc.  ... 
doi:10.7763/ijmo.2016.v6.500 fatcat:576itqj3izfrvkwqolvuehj5gy

Trasgo: a nested-parallel programming system

Arturo González-Escribano, Diego R. Llanos
2009 Journal of Supercomputing  
The approach allows the development of a modular compiler where automatic transformation techniques may exploit lower level and more complex synchronization structures, unlocking the limitations of pure  ...  Although their simple synchronization structure is appropriate to represent abstract parallel algorithms, it does not take into account many implementation issues.  ...  Synchronized parallel-for structures, teams of coarse threads, and task-queue schedulings.  ... 
doi:10.1007/s11227-009-0367-5 fatcat:mgdu46kgnjepbjs5gm3pg2bzxe
« Previous Showing results 1 — 15 out of 152,979 results