A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Automatic speculative DOALL for clusters
2012
Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12
Automatic parallelizing compilers such as SUIF [1, 19] and Polaris [3] parallelize a sequential program without programmer's intervention. ...
Parallelization APIs such as Cluster OpenMP [7] can help programmers parallelize sequential programs on clusters. ...
The Spec-DOALL parallelizer begins by transforming the sequential loop in the same way as the DOALL parallelizer (Line 2). It then creates a basic block named recoverBB (Lines 3-4). ...
doi:10.1145/2259016.2259029
dblp:conf/cgo/KimJLMA12
fatcat:rekpv7ckurahrafuvatz55o2ha
Programming MPSoC platforms: Road works ahead!
2009
2009 Design, Automation & Test in Europe Conference & Exhibition
Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel ...
On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool ...
For example, to expose explicit data parallelism in the model, the designer uses her/his application knowledge and invokes re-coding transformations to split loops into code partitions, analyze shared ...
doi:10.1109/date.2009.5090917
fatcat:dz4ubgggofc3dnfqlnyknucgsa
Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors
1994
Proceedings of the 8th international conference on Supercomputing - ICS '94
With this metric, our algorithm improves or matches hand-coded parallel programs on shared-memory, bus-based parallel machines for eight of the nine programs in our test suite. ...
The algorithm optimizes for data locality and parallelism, reducing or eliminating false sharing. It also uses interprocedural analysis and transformations to improve the granularity of parallelism. ...
Paul Havlak's implementation of regular sections proved invaluable. To all of these people go my thanks. ...
doi:10.1145/181181.181265
dblp:conf/ics/McKinley94
fatcat:4eulalgo3rc5naym2lhb2wjl6i
High Performance Air Pollution Simulation Using OpenMP
2004
Journal of Supercomputing
First of all we optimize the sequential program with the aim of increasing data locality. Then, the optimized program is parallelized using OpenMP shared-memory directives. ...
Experimental results on a 32processor SGI Origin 2000 show that the parallel program achieves important reductions in the execution times. ...
OpenMP is nowadays a standard 'de facto' for shared memory parallel programming. Using OpenMP the shared memory parallel programs can be made portable across a wide range of platforms. ...
doi:10.1023/b:supe.0000022102.00315.41
fatcat:uxy3pa6725dzfc4q4wwowicmjq
Compiling for Scalable Multiprocessors with Polaris
1997
Parallel Processing Letters
In this paper, we discuss our work at Illinois on the development of compiler techniques for scalable shared memory multiprocessors with noncoherent c a c hes. ...
Due to the complexity of programming scalable multiprocessors with physically distributed memories, it is onerous to manually generate parallel code for these machines. ...
Acknowledgements We w ould like to thank the Cray Research Inc. and the Pittsburgh Supercomputing Center for granting machine times for the experiments reported in this paper. ...
doi:10.1142/s0129626497000413
fatcat:gta4lqs46raixl4hbppfpgbnay
Experience with a clustered parallel reduction machine
1993
Future generations computer systems
It has been successfully applied to a number of algorithms resulting in a benchmark of small and medium size parallel functional programs. ...
A clustered architecture has been designed to exploit divide and conquer parallelism in functional programs. ...
GR/F 35081, FAST: Functional programming for ArrayS of Transputers. ...
doi:10.1016/0167-739x(93)90011-d
fatcat:ank45yd6czgm5hmbfoaqws33mm
Eliminating synchronization bottlenecks using adaptive replication
2003
ACM Transactions on Programming Languages and Systems
Given an unannotated sequential program written in C++, the compiler automatically extracts the concurrency, determines when it is legal to apply adaptive replication, and generates parallel code that ...
In addition to automatic parallelization and adaptive replication, our compiler also implements a lock coarsening transformation that increases the granularity at which the computation locks objects. ...
ACKNOWLEDGMENTS We would like to the anonymous referees of various versions of this article for their thoughtful and helpful comments. ...
doi:10.1145/641909.641911
fatcat:6ftcwn2lbbc3vhv2qb7spqujfm
Advances in Engineering Software for Multicore Systems
[chapter]
2018
Dependability Engineering
This chapter proposes a set of methods that employ an optimistic semi-automatic approach, which enables programmers to exploit parallelism on modern hardware architectures. ...
Another contribution is a method for detecting code sections where parallel design patterns might be applicable and suggesting relevant code transformations. ...
In simple cases, the program is automatically transformed into its parallel version based on available parallelism and the identified parallel design patterns [9] . ...
doi:10.5772/intechopen.72784
fatcat:rxkrgppehndvrnqzdafew3vfte
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction
2010
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10
Communication and synchronizations operations at multiple levels are generated automatically. The resulting mapping is currently emitted in the CUDA programming language. ...
The semantic transformations are expressed within the polyhedral model, including optimization of integrated parallelization, locality, and contiguity tradeoffs. Hierarchical tiling is performed. ...
In addition to these, two basic transformations may be applied to make the produced code match the CUDA programming model. ...
doi:10.1145/1735688.1735698
dblp:conf/asplos/LeungVMBWBL10
fatcat:5vnudjbr6vae5mwktixizhywpy
Analysis and Evaluation of the Performance of CAPE
2016
2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld)
CAPE is an approach based on checkpoints to allow the execution of OpenMP programs on distributed-memory architectures. ...
On the other hand, OpenMP is very easy to use but is restricted to shared-memory architectures. ...
It consists in a set of directives, functions and environment variables to easily support the transformation of a C, C++ or Fortran sequential program into a parallel program. ...
doi:10.1109/uic-atc-scalcom-cbdcom-iop-smartworld.2016.0104
dblp:conf/uic/TranRH16
fatcat:wg62kufkgzdq5lx2byp6bg4xf4
Domain-specific library generation for parallel software and hardware platforms
2008
Proceedings, International Parallel and Distributed Processing Symposium (IPDPS)
For the domain of linear transforms, Spiral automatically generates implementations for parallel platforms including SIMD vector extensions, multicore processors, field-programmable gate arrays (FPGAs) ...
The performance of the generated code is competitive with the best available hand-written libraries. ...
into shared memory code. ...
doi:10.1109/ipdps.2008.4536398
dblp:conf/ipps/FranchettiVMCTSDMHMP08
fatcat:f2o2k4tfhvhg7itarm4wznghte
A Modern Parallel Register Sharing Architecture for Code Compilation
2010
International Journal of Computer Applications
The code generation for parallel register share architecture involves some issues that are not present in sequential code compilation and is inherently complex. ...
On implementation part, it has been seen that most of applications are not able to use enough parallelism in parallel register sharing architecture. ...
First, powerful parallelizers should facilitate programming by allowing the development of much of the code in a familiar sequential programming language such as C. ...
doi:10.5120/334-505
fatcat:teqf26axmrekrbdriy7bnuoewq
Automatic C-to-CUDA Code Generation for Affine Programs
[chapter]
2010
Lecture Notes in Computer Science
Hence the automatic transformation of sequential input programs into efficient parallel CUDA programs is of considerable interest. ...
This paper describes an automatic code transformation system that generates parallel CUDA code from input sequential C code, for regular (affine) programs. ...
This work was supported in part by the U.S. ...
doi:10.1007/978-3-642-11970-5_14
fatcat:euk4pngadbcrfdzheclqlslahu
An MDE Approach for Automatic Code Generation from UML/MARTE to OpenCL
2013
Computing in science & engineering (Print)
The most commonly used standards are Open Message Passing (OpenMP) for shared memory and Message Passing Interface (MPI) for distributed memory programming. ...
In MDE, a model transformation is a compilation process that transforms a source model into a target model. ...
Acknowledgments This work is part of the Gaspard2 project, developed by the Dynamic Adaptivity and Real-Time (DART) ...
doi:10.1109/mcse.2012.35
fatcat:rozgjomllbhlrhdycqx5a26tou
Analysis of Multithreaded Programs
[chapter]
2001
Lecture Notes in Computer Science
The field of program analysis has focused primarily on sequential programming languages. ...
of weak memory consistency models. ...
Although many of these analyses were originally developed for the automatic parallelization of sequential programs, the basic approaches should generalize to handle the appropriate kinds of multithreaded ...
doi:10.1007/3-540-47764-0_1
fatcat:rwavkysiwbcblen2hshcaw6weq
« Previous
Showing results 1 — 15 out of 31,150 results