A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2003; you can also visit the original URL.
The file type is application/pdf
.
Filters
Synchronization transformations for parallel computing
1999
Concurrency Practice and Experience
As parallel machines become part of the mainstream computing environment, compilers will need to apply synchronization optimizations to deliver e cient parallel software. ...
This paper describes a new framework for synchronization optimizations and a new set of transformations for programs that implement critical sections using mutual exclusion locks. ...
The tasks in ne-grain parallel computations, for example, need fast synchronization for e cient control of their frequent interactions. ...
doi:10.1002/(sici)1096-9128(199911)11:13<773::aid-cpe453>3.0.co;2-5
fatcat:kdm3brli5ngdlj4k36bpinomhu
Synchronization transformations for parallel computing
1997
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '97
As parallel machines become part of the mainstream computing environment, compilers will need to apply synchronization optimizations to deliver e cient parallel software. ...
This paper describes a new framework for synchronization optimizations and a new set of transformations for programs that implement critical sections using mutual exclusion locks. ...
The tasks in ne-grain parallel computations, for example, need fast synchronization for e cient control of their frequent interactions. ...
doi:10.1145/263699.263718
dblp:conf/popl/RinardD97
fatcat:awb5apjb2zcczcktubr524qm5i
Parallel Execution of ATL Transformation Rules
[chapter]
2013
Lecture Notes in Computer Science
While parallelization is one of the traditional ways of making computation systems scalable, developing parallel model transformations in a general-purpose language is a complex and error-prone task. ...
We describe the implementation of a parallel transformation engine for the current version of the ATL language and experimentally evaluate the consequent gain in scalability. ...
For this reason we look to a more coarse-grained decomposition for the transformation computation. ...
doi:10.1007/978-3-642-41533-3_40
fatcat:axz3kt5go5eunorqjk5hvxfari
Dancing with uncertainty
2012
Proceedings of the 2012 ACM workshop on Relaxing synchronization for multicore and manycore scalability - RACES '12
This set of transformations generates a space of alternative, possibly non-deterministic, parallel programs with varying performance and accuracy characteristics. ...
relaxing synchronization primitives. ...
Dubstep thus narrows its optimization focus to these parallel sections. Since these computations contain synchronization barriers, they are a good target for the opportunistic barrier transformation. ...
doi:10.1145/2414729.2414738
dblp:conf/oopsla/MisailovicSR12
fatcat:bdkib2hjsvdijeessogbkv53zq
Lock Coarsening: Eliminating Lock Overhead in Automatically Parallelized Object-Based Programs
1998
Journal of Parallel and Distributed Computing
Atomic operations are a key primitive in parallel computing systems. The standard implementation mechanism for atomic operations uses mutual exclusion locks. ...
We h a ve implemented this technique in the context of a parallelizing compiler for irregular, object-based programs. ...
We h a ve implemented these algorithms and integrated them into a parallelizing compiler for object-based languages. ...
doi:10.1006/jpdc.1998.1441
fatcat:xyd557ugkzb5vj7prwj74vi2vq
Lock coarsening: Eliminating lock overhead in automatically parallelized object-based programs
[chapter]
1997
Lecture Notes in Computer Science
Atomic operations are a key primitive in parallel computing systems. The standard implementation mechanism for atomic operations uses mutual exclusion locks. ...
We h a ve implemented this technique in the context of a parallelizing compiler for irregular, object-based programs. ...
We h a ve implemented these algorithms and integrated them into a parallelizing compiler for object-based languages. ...
doi:10.1007/bfb0017259
fatcat:g7g6gkoyvzcchejky7rr3jjbje
Eliminating synchronization bottlenecks using adaptive replication
2003
ACM Transactions on Programming Languages and Systems
In addition to automatic parallelization and adaptive replication, our compiler also implements a lock coarsening transformation that increases the granularity at which the computation locks objects. ...
We have implemented adaptive replication in the context of a parallelizing compiler for a subset of C++. ...
ACKNOWLEDGMENTS We would like to the anonymous referees of various versions of this article for their thoughtful and helpful comments. ...
doi:10.1145/641909.641911
fatcat:6ftcwn2lbbc3vhv2qb7spqujfm
Auto-FCD: efficiently parallelizing CFD applications on clusters
2003
Proceedings IEEE International Conference on Cluster Computing CLUSTR-03
Computational Fluid Dynamics (CFD) applications are highly demanding for parallel computing. Many such applications have been shifted from expensive MPP boxes to cost-effective clusters. ...
Auto-CFD is a pre-compiler which transforms Fortran CFD sequential programs to efficient message-passing parallel programs running on clusters. Our work has the following three unique contributions. ...
The grid is then transformed into a computational grid in a regular shape, such as a rectangular grid. ...
doi:10.1109/clustr.2003.1253298
dblp:conf/cluster/XiaoZKFK03
fatcat:rgm5u7kezbdwfeudgqrd45ykyy
Parallelizing Sequential Programs with Statistical Accuracy Tests
2013
ACM Transactions on Embedded Computing Systems
We present QuickStep, a novel system for parallelizing sequential programs. ...
Unlike standard parallelizing compilers (which are designed to preserve the semantics of the original sequential computation), QuickStep is instead designed to generate (potentially nondeterministic) parallel ...
ACKNOWLEDGMENTS We would like to thank Dan Roy for his help with the statistical accuracy test and Stelios Sidiroglou and Danny Dig for their useful comments on the earlier drafts of this work. ...
doi:10.1145/2465787.2465790
fatcat:n5sq2veixnfu5e5d5lcqhds7xq
How do programs become more concurrent
2011
Proceeding of the 4th international workshop on Multicore software engineering - IWMSE '11
Our findings educate software developers on how to parallelize sequential programs, and provide hints for tool vendors about what transformations are worth automating. ...
In the multi-core era, programmers need to resort to parallelism if they want to improve program performance. Thus, a major maintenance task will be to make sequential programs more concurrent. ...
The authors thank Adam Kiezun, Stephen McCamant, Angeline Lee, Derek Rayside, and anonymous reviewers for providing helpful suggestions. Danny thanks Monika Dig, his greatest supporter. ...
doi:10.1145/1984693.1984700
fatcat:3sn3zh4befagtkiaa47t4up7m4
Coarse-Grained Loop Parallelization: Iteration Space Slicing vs Affine Transformations
2009
2009 Eighth International Symposium on Parallel and Distributed Computing
Automatic coarse-grained parallelization of program loops is of great importance for multi-core computing systems. ...
We demonstrate that Iteration Space Slicing permits for extracting more coarse-grained parallelism in comparison to the Affine Transformation Framework. ...
Affine transformations permit for the extraction of coarse-grained parallelism represented with synchronization-free threads. ...
doi:10.1109/ispdc.2009.15
dblp:conf/ispdc/BeletskaBCPS09
fatcat:ge2ju2saq5hpjf2cyyt2xfdyw4
A SHARED MEMORY BASED IMPLEMENTATION OF NEEDLEMAN-WUNSCH ALGORITHM USING SKEWING TRANSFORMATION
2017
International Journal of Advanced Research in Computer Science
We present two parallel approaches of Needleman-Wunsch algorithm with single kernel and multi-kernel invocation using skewing transformation which is used for traversing and calculation of dynamic programming ...
Among various algorithms for protein and nucleotide alignment, Needleman-Wunsch algorithm is widely accepted as it can divide the problem into sub-problems. ...
After that skewing transformation is applied so that computation can be done in parallel. The computation results are then copied back to original dynamic programming matrix. ...
doi:10.26483/ijarcs.v8i9.4953
fatcat:wl5sn2g25ffnbfjwpvcfoffguq
Maximizing parallelism and minimizing synchronization with affine transforms
1997
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '97
This paper presents an algorithm to find the optimal affine partitions that maximize the degree of parallelism and minimize the degree of synchronization in programs with arbitrary loop nestings and affine ...
The algorithm presented subsumes previously proposed loop transformation algorithms that are based on unimodular transformations, loop distribution, fusion, scaling, reindexing, and statement reordering ...
For the different parallelization schemes, each thick line represents a barrier synchronization, and each gray box groups together computations that are assigned to the same processor. ...
doi:10.1145/263699.263719
dblp:conf/popl/LimL97
fatcat:etohvz56xnb6fmsdfbw4prvyuu
Heterogeneous Model Merging Based on Model Transformation
2016
International Journal of Modeling and Optimization
In this paper we propose a series of rules and mechanisms on model transformation from SyncBlock to the SR model of computation in Ptolemy II for heterogeneous model merging. ...
In our previous work we proposed a system level design language named SyncBlock and developed a toolset for the design of synchronous embedded system. ...
On the other hand, Ptolemy II [2] , [3] defines many models of computation like Synchronous Reactive (SR), Discrete Event (DE), and synchronous dataflow (SDF) etc. ...
doi:10.7763/ijmo.2016.v6.500
fatcat:576itqj3izfrvkwqolvuehj5gy
Trasgo: a nested-parallel programming system
2009
Journal of Supercomputing
The approach allows the development of a modular compiler where automatic transformation techniques may exploit lower level and more complex synchronization structures, unlocking the limitations of pure ...
Although their simple synchronization structure is appropriate to represent abstract parallel algorithms, it does not take into account many implementation issues. ...
Synchronized parallel-for structures, teams of coarse threads, and task-queue schedulings. ...
doi:10.1007/s11227-009-0367-5
fatcat:mgdu46kgnjepbjs5gm3pg2bzxe
« Previous
Showing results 1 — 15 out of 152,979 results