Filters








187,614 Hits in 5.3 sec

Run-time methods for parallelizing partially parallel loops

Lawrence Rauchwerger, Nancy M. Amato, David A. Padua
1995 Proceedings of the 9th international conference on Supercomputing - ICS '95  
In this paper we give a new run-time technique for finding an optimal parallel execution schedule for a partially parallel loop, i.e., a loop whose parallelization requires synchronization to ensure that  ...  In addition, it can implement at run-time the two most effective transformations for increasing the amount of parallelism in a loop: array privatization and reduction parallelization (element-wise).  ...  We would like to thank Paul Petersen for his useful advice, and William Blume and Brett Marsolf for identifying and clarifying applications for our experiments.  ... 
doi:10.1145/224538.224553 dblp:conf/ics/RauchwergerAP95 fatcat:zzvbbeyllndc5ovpini6ed4jym

Run-time parallelization: Its time has come

Lawrence Rauchwerger
1998 Parallel Computing  
After describing the problem of loop parallelization and its difficulties, a general overview of the need for techniques of run-time parallelization is given.  ...  A survey of the various approaches to parallelizing partially parallel loops and fully parallel loops is presented.  ...  Run-Time Techniques for Fully Parallel (Doall) Loops Most approaches to run-time parallelization have concentrated on developing methods for constructing execution schedules for partially parallel loops  ... 
doi:10.1016/s0167-8191(98)00024-6 fatcat:tovbe2cdbfhdjd4bdzxw2lzp6u

Run-time parallelization for loops

Shih-Hung Kao, Chao-Tung Yang, Shian-Shyong Tseng
1996 Proceedings of HICSS-29: 29th Hawaii International Conference on System Sciences  
In this paper, a run-time technique based on insp/exec scheme (inspector phase and executor phase) is proposed for finding parallelism on loops.  ...  , an appropriate loop scheduling should be considered for achieving high parallelism.  ...  We would like to thank the anonymous reviewers for suggesting of improvements, and offering of encouragements.  ... 
doi:10.1109/hicss.1996.495467 dblp:conf/hicss/KaoYT96 fatcat:cpp2mo47u5hwdlchem6lmh3t3i

A scalable method for run-time loop parallelization

Lawrence Rauchwerger, Nancy M. Amato, David A. Padua
1995 International journal of parallel programming  
This new method has advantages over all previous run-time techniques for analyzing and scheduling partially parallel loops since none of them has all of these desirable properties.  ...  In this paper we give a new run-time technique for finding an optimal parallel execution schedule for a partially parallel loop, i.e., a loop whose parallelization requires synchronization to ensure that  ...  Acknowledgment We would like to thank Paul Petersen for his useful advice, and William Blume and Brett Marsolf for identifying and clarifying applications for our experiments.  ... 
doi:10.1007/bf02577866 fatcat:rwplt6ri6ncizn4tb3gj6naflq

Speculative Parallelization of Partially Parallel Loops [chapter]

Francis H. Dang, Lawrence Rauchwerger
2000 Lecture Notes in Computer Science  
Moreover, the existing, partial parallelism of loops is not exploited.  ...  limits potential slowdowns to the overhead of the run-time dependence test itself, i.e., removes the time lost due to incorrect parallel execution.  ...  While for fully parallel loops the method performs very well, partially parallel loops will experience a slow-down equal to the speculative parallel execution time (the loop has to be re-executed sequentially  ... 
doi:10.1007/3-540-40889-4_22 fatcat:4yiuncsiurawfivld6cn75cz2m

Storage Mapping Optimization for Parallel Programs [chapter]

Albert Cohen, Vincent Lefebvre
1999 Lecture Notes in Computer Science  
Parallelization via memory expansion requires both moderation in the expansion degree and e ciency at run-time.  ...  We present a general storage mapping optimization framework for imperative programs, applicable to most loop nest parallelization techniques.  ...  Acknowledgments: We would like to thank Jean-Fran cois Collard, Paul Feautrier and Denis Barthou for their help and support.  ... 
doi:10.1007/3-540-48311-x_49 fatcat:b3iiem64ffeu7av36szw3uiy6i

SmartApps: An Application Centric Approach to High Performance Computing [chapter]

Lawrence Rauchwerger, Nancy M. Amato, Josep Torrellas
2001 Lecture Notes in Computer Science  
In the executable of smart applications, the compiler embeds most run-time system services, and a performance-optimizing feedback loop that monitors the application's performance and adaptively reconfigures  ...  At run-time, after incorporating the code's input and the system's resources and state, the SmartApp performs a global optimization.  ...  Partially Parallel Loop Parallelization. We have previously developed a run-time technique for finding an optimal parallel execution schedule for a partially parallel loop [23, 24] .  ... 
doi:10.1007/3-540-45574-4_6 fatcat:3ajm4npybjfczgx4dmq7wnt6eq

The LRPD test

Lawrence Rauchwerger, David Padua
1995 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation - PLDI '95  
Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching: it detects at run-time if the values stored in an array participate  ...  run-time data dependence test.  ...  Acknowledgment We would like to thank Paul Petersen for his useful advice, and William Blume and Gung-Chung Yang for identifying and clarifying applications for our experiments.  ... 
doi:10.1145/207110.207148 dblp:conf/pldi/RauchwergerP95 fatcat:ojt72qrkpzgfdm7hchsg774sme

Implementation Issues of Loop-Level Speculative Run-Time Parallelization [chapter]

Devang Patel, Lawrence Rauchwerger
1999 Lecture Notes in Computer Science  
We advocate a novel framework for the identification of parallel loops.  ...  It speculatively executes a loop as a doall and applies a fully parallel data dependence test to check for any unsatisfied data dependencies; if the test fails, then the loop is re-executed serially.  ...  Currently, candidate loops for run-time parallelization are marked by a special directive in the Fortran source code.  ... 
doi:10.1007/978-3-540-49051-7_13 fatcat:x3wlmc6nbrb3dogno7e4ri6cri

The R-LRPD test: speculative parallelization of partially parallel loops

F. Dang, Hao Yu, L. Rauchwerger
2002 Proceedings 16th International Parallel and Distributed Processing Symposium  
Moreover, the existing, partial parallelism of loops is not exploited.  ...  Efficient Run-Time Parallelization Needed for All Loops To achieve a high level of performance for a particular program on today's supercomputers, software developers are often forced to tediously hand-code  ...  Conclusion In this paper we have shown how to exploit parallelism in loops that are less than fully parallel and thus cannot be parallelized with either compile time analysis nor with the original LRPD  ... 
doi:10.1109/ipdps.2002.1015493 dblp:conf/ipps/DangYR02 fatcat:o44hrlzobbemlfwthlnupdwck4

The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

L. Rauchwerger, D.A. Padua
1999 IEEE Transactions on Parallel and Distributed Systems  
Another important contribution of this paper is a novel method for reduction recognition which goes beyond syntactic pattern matching: It detects at run-time if the values stored in an array participate  ...  can speculatively apply these transformations and then check their validity at run-time.  ...  ACKNOWLEDGMENTS We would like to thank Paul Petersen for his useful advice, and William Blume, Gung-Chung Yang, and Andrei Vladimirescu for identifying and clarifying applications for our experiments.  ... 
doi:10.1109/71.752782 fatcat:wsjtf7kievftjdzsgmvckq7y2m

Faster algorithms for RNA-folding using the Four-Russians method

Balaji Venkatachalam, Dan Gusfield, Yelena Frid
2014 Algorithms for Molecular Biology  
The DP algorithm runs in cubic time and there have been many attempts at improving its running time. Here, we use the four-Russians method for speeding up.  ...  We discuss the organization of the data structures to exploit coalesced memory access for fast running time. These ideas also help in improving the running time of the serial algorithms.  ...  John Owens for allowing access to the server in his lab; the faster running times reported are from that machine.  ... 
doi:10.1186/1748-7188-9-5 pmid:24602450 pmcid:PMC3996002 fatcat:e6fstiezhjenlo7ty6cnnbqhoi

ASYNC Loop Constructs for Relaxed Synchronization [chapter]

Russell Meyers, Zhiyuan Li
2008 Lecture Notes in Computer Science  
ASYNC DO annotates a loop whose iterations can be executed by multiple processors, as OpenMP parallel DO loops in Fortran (or parallel for loops in C), but it does not require barrier synchronization.  ...  Conventional iterative solvers for partial differential equations impose strict data dependencies between each solution point and its neighbors.  ...  The authors thank the reviewers for their careful reviews and helpful suggestions. We also thank Ananth Grama for proposing the use of relaxed barrier tree for reduction.  ... 
doi:10.1007/978-3-540-89740-8_20 fatcat:kvngwop3avecrgqbclzi747zvq

Smartapps, an application centric approach to high performance computing: compiler-assisted software and hardware support for reduction operations

F. Dang, M. Jesus Garzaran, M. Prvulovic, Ye Zhang, A. Jula, Hao Yu, N. Amato, L. Rauchwerger, J. Torrellas
2002 Proceedings 16th International Parallel and Distributed Processing Symposium  
In the executable of smart applications, the compiler embeds most run-time system services, and a performance-optimizing feedback loop that monitors the application's performance and adaptively reconfigures  ...  In this paper, we first describe the overall architecture of SMARTAPPS and then present some achievements to date, focusing on compiler-assisted software and hardware techniques for parallelizing reduction  ...  We have recently developed a new technique that can extract the maximum available parallelism from a partially parallel loop and that removes limitations of previous methods (for partially parallel loops  ... 
doi:10.1109/ipdps.2002.1016572 dblp:conf/ipps/DangGPZJYART02 fatcat:mzcg6lqf35b63pt37dkbiaeis4

Data partitioning-based parallel irregular reductions

Eladio Gutiérrez, Oscar Plata, Emilio L. Zapata
2004 Concurrency and Computation  
Different parallelization methods for irregular reductions on shared memory multiprocessors have been proposed in the literature in recent years.  ...  Efficient implementations of the proposed optimizing solutions for a particular method are presented, experimentally tested on static and dynamic kernel codes, and compared with other parallel reduction  ...  When the number of processors is larger than eight, a lower execution time is achieved with the partially expanded method using ρ = 4.  ... 
doi:10.1002/cpe.769 fatcat:mtq2yavhprgjjivgbjg2wtc2fm
« Previous Showing results 1 — 15 out of 187,614 results