31,150 Hits in 5.9 sec

Automatic speculative DOALL for clusters

Hanjun Kim, Nick P. Johnson, Jae W. Lee, Scott A. Mahlke, David I. August
2012 Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12  
Automatic parallelizing compilers such as SUIF [1, 19] and Polaris [3] parallelize a sequential program without programmer's intervention.  ...  Parallelization APIs such as Cluster OpenMP [7] can help programmers parallelize sequential programs on clusters.  ...  The Spec-DOALL parallelizer begins by transforming the sequential loop in the same way as the DOALL parallelizer (Line 2). It then creates a basic block named recoverBB (Lines 3-4).  ... 
doi:10.1145/2259016.2259029 dblp:conf/cgo/KimJLMA12 fatcat:rekpv7ckurahrafuvatz55o2ha

Programming MPSoC platforms: Road works ahead!

R. Leupers, A. Vajda, M. Bekooij, Soonhoi Ha, R. Domer, A. Nohl
2009 2009 Design, Automation & Test in Europe Conference & Exhibition  
Efficient utilization of the MPSoC HW resources demands for radically new models and corresponding SW development tools, capable of exploiting the available parallelism and guaranteeing bug-free parallel  ...  On the other hand, at least for coming years, the freedom for disruptive programming technologies is limited by the huge amount of certified sequential code that demands for a more pragmatic, gradual tool  ...  For example, to expose explicit data parallelism in the model, the designer uses her/his application knowledge and invokes re-coding transformations to split loops into code partitions, analyze shared  ... 
doi:10.1109/date.2009.5090917 fatcat:dz4ubgggofc3dnfqlnyknucgsa

Evaluating automatic parallelization for efficient execution on shared-memory multiprocessors

Kathryn S. McKinley
1994 Proceedings of the 8th international conference on Supercomputing - ICS '94  
With this metric, our algorithm improves or matches hand-coded parallel programs on shared-memory, bus-based parallel machines for eight of the nine programs in our test suite.  ...  The algorithm optimizes for data locality and parallelism, reducing or eliminating false sharing. It also uses interprocedural analysis and transformations to improve the granularity of parallelism.  ...  Paul Havlak's implementation of regular sections proved invaluable. To all of these people go my thanks.  ... 
doi:10.1145/181181.181265 dblp:conf/ics/McKinley94 fatcat:4eulalgo3rc5naym2lhb2wjl6i

High Performance Air Pollution Simulation Using OpenMP

María J. Martín, Marta Parada, Ramón Doallo
2004 Journal of Supercomputing  
First of all we optimize the sequential program with the aim of increasing data locality. Then, the optimized program is parallelized using OpenMP shared-memory directives.  ...  Experimental results on a 32processor SGI Origin 2000 show that the parallel program achieves important reductions in the execution times.  ...  OpenMP is nowadays a standard 'de facto' for shared memory parallel programming. Using OpenMP the shared memory parallel programs can be made portable across a wide range of platforms.  ... 
doi:10.1023/b:supe.0000022102.00315.41 fatcat:uxy3pa6725dzfc4q4wwowicmjq

Compiling for Scalable Multiprocessors with Polaris

Yunheung Paek, David A. Padua
1997 Parallel Processing Letters  
In this paper, we discuss our work at Illinois on the development of compiler techniques for scalable shared memory multiprocessors with noncoherent c a c hes.  ...  Due to the complexity of programming scalable multiprocessors with physically distributed memories, it is onerous to manually generate parallel code for these machines.  ...  Acknowledgements We w ould like to thank the Cray Research Inc. and the Pittsburgh Supercomputing Center for granting machine times for the experiments reported in this paper.  ... 
doi:10.1142/s0129626497000413 fatcat:gta4lqs46raixl4hbppfpgbnay

Experience with a clustered parallel reduction machine

M Beemster, P.H Hartel, L.O Hertzberger, R.F.H Hofman, K.G Langendoen, L.L Li, R Milikowski, WG Vree, H.P Barendregt, J.C Mulder
1993 Future generations computer systems  
It has been successfully applied to a number of algorithms resulting in a benchmark of small and medium size parallel functional programs.  ...  A clustered architecture has been designed to exploit divide and conquer parallelism in functional programs.  ...  GR/F 35081, FAST: Functional programming for ArrayS of Transputers.  ... 
doi:10.1016/0167-739x(93)90011-d fatcat:ank45yd6czgm5hmbfoaqws33mm

Eliminating synchronization bottlenecks using adaptive replication

Martin C. Rinard, Pedro C. Diniz
2003 ACM Transactions on Programming Languages and Systems  
Given an unannotated sequential program written in C++, the compiler automatically extracts the concurrency, determines when it is legal to apply adaptive replication, and generates parallel code that  ...  In addition to automatic parallelization and adaptive replication, our compiler also implements a lock coarsening transformation that increases the granularity at which the computation locks objects.  ...  ACKNOWLEDGMENTS We would like to the anonymous referees of various versions of this article for their thoughtful and helpful comments.  ... 
doi:10.1145/641909.641911 fatcat:6ftcwn2lbbc3vhv2qb7spqujfm

Advances in Engineering Software for Multicore Systems [chapter]

Ali Jannesari
2018 Dependability Engineering  
This chapter proposes a set of methods that employ an optimistic semi-automatic approach, which enables programmers to exploit parallelism on modern hardware architectures.  ...  Another contribution is a method for detecting code sections where parallel design patterns might be applicable and suggesting relevant code transformations.  ...  In simple cases, the program is automatically transformed into its parallel version based on available parallelism and the identified parallel design patterns [9] .  ... 
doi:10.5772/intechopen.72784 fatcat:rxkrgppehndvrnqzdafew3vfte

A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction

Allen Leung, Nicolas Vasilache, Benoît Meister, Muthu Baskaran, David Wohlford, Cédric Bastoul, Richard Lethin
2010 Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units - GPGPU '10  
Communication and synchronizations operations at multiple levels are generated automatically. The resulting mapping is currently emitted in the CUDA programming language.  ...  The semantic transformations are expressed within the polyhedral model, including optimization of integrated parallelization, locality, and contiguity tradeoffs. Hierarchical tiling is performed.  ...  In addition to these, two basic transformations may be applied to make the produced code match the CUDA programming model.  ... 
doi:10.1145/1735688.1735698 dblp:conf/asplos/LeungVMBWBL10 fatcat:5vnudjbr6vae5mwktixizhywpy

Analysis and Evaluation of the Performance of CAPE

Van Long Tran, Eric Renault, Viet Hai Ha
2016 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld)  
CAPE is an approach based on checkpoints to allow the execution of OpenMP programs on distributed-memory architectures.  ...  On the other hand, OpenMP is very easy to use but is restricted to shared-memory architectures.  ...  It consists in a set of directives, functions and environment variables to easily support the transformation of a C, C++ or Fortran sequential program into a parallel program.  ... 
doi:10.1109/uic-atc-scalcom-cbdcom-iop-smartworld.2016.0104 dblp:conf/uic/TranRH16 fatcat:wg62kufkgzdq5lx2byp6bg4xf4

Domain-specific library generation for parallel software and hardware platforms

Franz Franchetti, Yevgen Voronenko, Peter A. Milder, Srinivas Chellappa, Marek R. Telgarsky, Hao Shen, Paolo D'Alberto, Frederic de Mesmay, James C. Hoe, Jose M. F. Moura, Markus Puschel
2008 Proceedings, International Parallel and Distributed Processing Symposium (IPDPS)  
For the domain of linear transforms, Spiral automatically generates implementations for parallel platforms including SIMD vector extensions, multicore processors, field-programmable gate arrays (FPGAs)  ...  The performance of the generated code is competitive with the best available hand-written libraries.  ...  into shared memory code.  ... 
doi:10.1109/ipdps.2008.4536398 dblp:conf/ipps/FranchettiVMCTSDMHMP08 fatcat:f2o2k4tfhvhg7itarm4wznghte

A Modern Parallel Register Sharing Architecture for Code Compilation

Rajendra Kumar, Dr. P. K. Singh
2010 International Journal of Computer Applications  
The code generation for parallel register share architecture involves some issues that are not present in sequential code compilation and is inherently complex.  ...  On implementation part, it has been seen that most of applications are not able to use enough parallelism in parallel register sharing architecture.  ...  First, powerful parallelizers should facilitate programming by allowing the development of much of the code in a familiar sequential programming language such as C.  ... 
doi:10.5120/334-505 fatcat:teqf26axmrekrbdriy7bnuoewq

Automatic C-to-CUDA Code Generation for Affine Programs [chapter]

Muthu Manikandan Baskaran, J. Ramanujam, P. Sadayappan
2010 Lecture Notes in Computer Science  
Hence the automatic transformation of sequential input programs into efficient parallel CUDA programs is of considerable interest.  ...  This paper describes an automatic code transformation system that generates parallel CUDA code from input sequential C code, for regular (affine) programs.  ...  This work was supported in part by the U.S.  ... 
doi:10.1007/978-3-642-11970-5_14 fatcat:euk4pngadbcrfdzheclqlslahu

An MDE Approach for Automatic Code Generation from UML/MARTE to OpenCL

A. Wendell O. Rodrigues, Frederic Guyomarc'h, Jean-Luc Dekeyser
2013 Computing in science & engineering (Print)  
The most commonly used standards are Open Message Passing (OpenMP) for shared memory and Message Passing Interface (MPI) for distributed memory programming.  ...  In MDE, a model transformation is a compilation process that transforms a source model into a target model.  ...  Acknowledgments This work is part of the Gaspard2 project, developed by the Dynamic Adaptivity and Real-Time (DART)  ... 
doi:10.1109/mcse.2012.35 fatcat:rozgjomllbhlrhdycqx5a26tou

Analysis of Multithreaded Programs [chapter]

Martin Rinard
2001 Lecture Notes in Computer Science  
The field of program analysis has focused primarily on sequential programming languages.  ...  of weak memory consistency models.  ...  Although many of these analyses were originally developed for the automatic parallelization of sequential programs, the basic approaches should generalize to handle the appropriate kinds of multithreaded  ... 
doi:10.1007/3-540-47764-0_1 fatcat:rwavkysiwbcblen2hshcaw6weq
« Previous Showing results 1 — 15 out of 31,150 results