Filters








31 Hits in 3.3 sec

Removal of redundant dependences in DOACROSS loops with constant dependences

V. P. Krothapalli, P. Sadayappan
1991 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '91  
These synchronization instructions can represent a significant part of the overhead in the parallel execution of a parallel program.  ...  Dependence constrain the parallel execution of an imperative program and are typically enforced by synchronization instructions.  ...  The second contribution of this paper is characterization of redundancy for doubly nested loops with constant dependence.  ... 
doi:10.1145/109625.109632 dblp:conf/ppopp/KrothapalliS91 fatcat:nx5l7kz4uvdqndu6xgjlwwroje

Speculative Decoupled Software Pipelining

Neil Vachharajani, Ram Rangan, Easwaran Raman, Matthew J. Bridges, Guilherme Ottoni, David I. August
2007 Parallel Architecture and Compilation Techniques (PACT), Proceedings of the International Conference on  
By speculating past infrequent dependences, the benefit of DSWP is increased by making it applicable to more loops, facilitating better balanced threads, and enabling parallelized loops to be run on more  ...  To avoid burdening programmers with the responsibility of parallelizing their applications, some researchers have advocated automatic thread extraction.  ...  The authors acknowledge the support of the GSRC Focus Center, one of five research centers funded under the Focus Center Research Program, a Semiconductor Research Corporation program.  ... 
doi:10.1109/pact.2007.4336199 fatcat:zdfi2dh3ujcsvlshzwhwcnejbi

Using knowledge-based systems for research on parallelizing compilers

Chao-Tung Yang, Shian-Shyong Tseng, Yun-Woei Fann, Ting-Ku Tsai, Ming-Huei Hsieh, Cheng-Tien Wu
2001 Concurrency and Computation  
It is well known that the execution efficiency of a loop can be enhanced if the loop is executed in parallel or partially parallel, such as in a DOALL or DOACROSS loop.  ...  The PPD can extract the potential DOALL and DOACROSS loops in a program by verifying array subscripts.  ...  ACKNOWLEDGEMENT This work was supported in part by the National Science Council of the Republic of China under grant nos. NSC86-2213-E009-081 and NSC87-2213-E009-023.  ... 
doi:10.1002/cpe.563 fatcat:z5wil2dh6bexdgdvffrvokqqvq

Optimal loop parallelization

A. Aiken, A. Nicolau
1988 SIGPLAN notices  
The figure given for the original code is the better of the two approaches; in LL6 redundant load removal greatly improved the performance of the original code.  ...  In particular, the hardware support for indirect addressing used by vector machines (auto-increment of index registers) and a relatively sophisticated compiler optimization (removing redundant loads across  ...  PhD thesis, Univ. of Illinois at Urbana-Champaign, 1984. 191  ... 
doi:10.1145/960116.54021 fatcat:bhvrs4yfqraphhp2vps2dlo3ca

Optimal loop parallelization

A. Aiken, A. Nicolau
1988 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation - PLDI '88  
The figure given for the original code is the better of the two approaches; in LL6 redundant load removal greatly improved the performance of the original code.  ...  In particular, the hardware support for indirect addressing used by vector machines (auto-increment of index registers) and a relatively sophisticated compiler optimization (removing redundant loads across  ...  PhD thesis, Univ. of Illinois at Urbana-Champaign, 1984. 191  ... 
doi:10.1145/53990.54021 dblp:conf/pldi/AikenN88 fatcat:mf5uma46zjba5odz3gani7m2nm

HELIX

Simone Campanoni, Timothy Jones, Glenn Holloway, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks
2012 Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12  
The framework uses an analytical model of loop speedups, combined with profile data, to choose loops to parallelize.  ...  We show that the inter-thread communication costs forced by loop-carried data dependences can be mitigated by code optimization, by using an effective heuristic for selecting loops to parallelize, and  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of our sponsors.  ... 
doi:10.1145/2259016.2259028 dblp:conf/cgo/CampanoniJHRWB12 fatcat:saxndpn5rvhodc7nsnfl7tjmxq

Restructuring Fortran programs for Cedar

Rudolf Eigenmann, Jay Hoeflinger, Greg Jaxon, Zhiyuan Li, David Padua
1993 Concurrency Practice and Experience  
A collection of experiments illustrate the e ectiveness of the current implementation, and point t o ward new approaches to be incorporated into the system in the near future.  ...  This paper reports on the status of the Fortran translator for the Cedar computer at the end of March, 1991.  ...  We h a ve extended the restructurer to cope with the challenges presented by the Cedar machine. The modi ed restructurer performs well on some linear algebra routines and synthetic loops.  ... 
doi:10.1002/cpe.4330050704 fatcat:mvjdpxq555g2bjldvf6bjqhdmy

Speculative parallelization using state separation and multiple value prediction

Chen Tian, Min Feng, Rajiv Gupta
2010 Proceedings of the 2010 international symposium on Memory management - ISMM '10  
The key idea is to generate multiple versions of a loop iteration based on multiple predictions of values of variables involved in cross-iteration dependences (i.e., live-in variables).  ...  Consequently, for a loop with frequently arising cross-iteration dependences, previous techniques are not able to speed up the execution.  ...  Acknowledgments This work is supported by NSF grants CCF-0963996, CCF-0905509, CNS-0751961, and CNS-0810906 to the University of California, Riverside.  ... 
doi:10.1145/1806651.1806663 dblp:conf/iwmm/TianFG10 fatcat:itjz34mdlvaadaovae3d7j7ap4

Discovery of Potential Parallelism in Sequential Programs

Zhen Li, Ali Jannesari, Felix Wolf
2013 2013 42nd International Conference on Parallel Processing  
with high accuracy, identifying 92.5% of the parallel loops in NAS benchmarks; 3) when parallelizing well-known open-source software following the outputs of the framework, reasonable speedups are obtained  ...  Results of our experiments show that 1) the efficient data-dependence profiler has a very competitive average slowdown of around 80× with accuracy higher than 99.6%; 2) the framework discovers parallelism  ...  Rule of determining DOACROSS loops A loop is classified as a DOACROSS loop if it is not a DOALL loop, and there is no inter-iteration dependence that starts from the read phase of the first CU (in single-iteration  ... 
doi:10.1109/icpp.2013.119 dblp:conf/icpp/LiJW13 fatcat:6dc5s2ao4rhv7avxb4oai77hoi

Automatic detection of nondeterminacy in parallel programs

Perry A. Emrath, David A. Padua
1989 SIGPLAN notices  
In this way the user may avoid being overwhelmed with redundant or unnecessary information. thread of the loop will always change K(1) from 0 to I squared, but on some runs the array K may go through the  ...  Consider the following loops: In some cases the distance may not be a constant or may not be possible to compute by the techniques used in the source analyzer.  ...  A second part of the user interface should accept information from the user, either in the form of assertions or commands. The assertions will reduce the number of races that have to be reported.  ... 
doi:10.1145/69215.69224 fatcat:natcejljurav3bcouh4t3wq34q

Automatic detection of nondeterminacy in parallel programs

Perry A. Emrath, David A. Padua
1988 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging - PADD '88  
In this way the user may avoid being overwhelmed with redundant or unnecessary information. thread of the loop will always change K(1) from 0 to I squared, but on some runs the array K may go through the  ...  Consider the following loops: In some cases the distance may not be a constant or may not be possible to compute by the techniques used in the source analyzer.  ...  A second part of the user interface should accept information from the user, either in the form of assertions or commands. The assertions will reduce the number of races that have to be reported.  ... 
doi:10.1145/68210.69224 dblp:conf/pdd/EmrathP88 fatcat:htkevhrwdnenngqjw2sdw4o5oe

Data distribution support on distributed shared memory multiprocessors

Rohit Chandra, Ding-Kai Chen, Robert Cox, Dror E. Maydan, Nenad Nedeljkovic, Jennifer M. Anderson
1997 SIGPLAN notices  
In addition to making efiective use of caches, it is often necessary to distribute data structures across the local memories of the processing nodes, thereby reducing the latency of cache misses.  ...  substantial performance gains, in some cases by as much as a factor of 3 over the same codes without distribution.  ...  Acknowledgments: We thank Jeff McDonald who helped us with the LU application, Chau-Wen Tseng who participated in the initial stages of this work, and Seema Hiranandani and the anonymous referees who offered  ... 
doi:10.1145/258916.258945 fatcat:bi5pvigbjfd3viweyr4grbytmu

An efficient algorithm for the run-time parallelization of DOACROSS loops

Ding Kai Chen, Josep Torrellas, Pen Chung Yew
1994 Supercomputing, Proceedings  
Our scheme handles any type of data dependence in the loop without requiring any special architectural support in the multiprocessor.  ...  Pen-Chung Yew While loop parallelization usually relies on compile-time analysis of data dependences, for some loops the data dependences cannot be detennined at compile time.  ...  In addition, as indicated above it removes redundant operations in the inspector.  ... 
doi:10.1145/602855.602857 fatcat:xm6s3sysdraipe72uzykpquyqe

Parallelized Algorithms for Finding Similar Images and Object Recognition

Rafal Fraczek, Boguslaw Cyganek, Kazimierz Wiatr
2013 Computer Science  
The paper addresses the issue of searching for similar images and objects in a repository of information. The contained images are annotated with the help of the sparse descriptors.  ...  Results of these experiments, as well as discussion of the advantages and limitations of different combinations of methods are presented.  ...  Acknowledgements This research was supported from the Polish funds for scientific research in the years 2010/2011 under the Synat project. Parallelized algorithms for finding similar images (...)  ... 
doi:10.7494/csci.2013.14.1.113 fatcat:mt7euvqqcjbsbgk3isbxip5ayy

Exploiting Tightly-Coupled Cores

Daniel Bates, Alex Bradbury, Andreas Koltes, Robert Mullins
2014 Journal of Signal Processing Systems  
We explore the tile's ability to support a range of parallelisation opportunities and detail the control and communication mechanisms needed to exploit each core's resources in a flexible manner.  ...  We evaluate one such design, called Loki, that aims to support specialisation in software on a homogeneous many-core architecture.  ...  Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s)  ... 
doi:10.1007/s11265-014-0944-6 fatcat:oaujz5oqqncrlotwnv2sxp6qvy
« Previous Showing results 1 — 15 out of 31 results