Filters








11,266 Hits in 5.6 sec

Maximizing parallelism and minimizing synchronization with affine transforms

Amy W. Lim, Monica S. Lam
1997 Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '97  
This paper presents an algorithm to find the optimal affine partitions that maximize the degree of parallelism and minimize the degree of synchronization in programs with arbitrary loop nestings and affine  ...  The algorithm presented subsumes previously proposed loop transformation algorithms that are based on unimodular transformations, loop distribution, fusion, scaling, reindexing, and statement reordering  ...  This model makes it easy to develop algorithms that maximize ( ) parallelism and minimize synchronization simultaneously.  ... 
doi:10.1145/263699.263719 dblp:conf/popl/LimL97 fatcat:etohvz56xnb6fmsdfbw4prvyuu

A data locality optimizing algorithm

Monica S. Lam, Michael E. Wolf
2004 SIGPLAN notices  
Our algorithm is proven to maximize the degree of parallelism while minimizing the degree of synchronizations [4, 5] .  ...  The main research objective of the SUIF project was to improve data locality for uniprocessors and to maximize parallelism and minimize communication for multiprocessors.  ... 
doi:10.1145/989393.989437 fatcat:eajlltosg5gjbhqtdituqhh3hi

A data locality optimizing algorithm

Michael E. Wolf, Monica S. Lam
1991 SIGPLAN notices  
Our algorithm is proven to maximize the degree of parallelism while minimizing the degree of synchronizations [4, 5] .  ...  The main research objective of the SUIF project was to improve data locality for uniprocessors and to maximize parallelism and minimize communication for multiprocessors.  ... 
doi:10.1145/113446.113449 fatcat:2ovh6wjxmjhorne6ox3dlj47ha

Loop parallelization algorithms: From parallelism extraction to code generation

Pierre Boulet, Alain Darte, Georges-André Silber, Frédéric Vivien
1998 Parallel Computing  
ability to incorporate various optimizing criteria such as maximal parallelism detection, permutable loop detection, minimization of synchronizations, easiness of code generation, etc.  ...  In this paper, we survey loop parallelization algorithms, analyzing the dependence representations they use, the loop transformations they generate, the code generation schemes they require, and their  ...  We want to generate as few parallel loops as possible and no sequential loops, in order to have, once again, the maximal parallelism while minimizing synchronizations. Ž .  ... 
doi:10.1016/s0167-8191(98)00020-9 fatcat:bxqeg7gta5f7dmqa6lwy6oewpm

Coarse-Grained Loop Parallelization: Iteration Space Slicing vs Affine Transformations

Anna Beletska, Wlodzimierz Bielecki, Albert Cohen, Marek Palkowski, Krzysztof Siedlecki
2009 2009 Eighth International Symposium on Parallel and Distributed Computing  
This paper presents a comparison of Iteration Space Slicing and Affine Transformation Framework algorithms aimed at extracting coarse-grained parallelism available in arbitrarily nested parameterized affine  ...  We demonstrate that Iteration Space Slicing permits for extracting more coarse-grained parallelism in comparison to the Affine Transformation Framework.  ...  Affine transformations permit for the extraction of coarse-grained parallelism represented with synchronization-free threads.  ... 
doi:10.1109/ispdc.2009.15 dblp:conf/ispdc/BeletskaBCPS09 fatcat:ge2ju2saq5hpjf2cyyt2xfdyw4

Synchronization-Free Automatic Parallelization: Beyond Affine Iteration-Space Slicing [chapter]

Anna Beletska, Wlodzimierz Bielecki, Albert Cohen, Marek Palkowski
2010 Lecture Notes in Computer Science  
This paper contributes to the theory and practice of automatic extraction of synchronization-free parallelism in nested loops.  ...  The algorithm generates an outer loop to spawn synchronization-free slices to be executed in parallel, enclosing sequential loops iterating over those slices.  ...  Acknowledgments This work was partly supported by the SARC FET-27648 and ACOTES IST-34869 european FP6 projects.  ... 
doi:10.1007/978-3-642-13374-9_16 fatcat:h5nsftswqrgfhfqtys3cocqd3i

Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading [chapter]

Sunil Shrestha, Joseph Manzano, Andres Marquez, John Feo, Guang R. Gao
2015 Lecture Notes in Computer Science  
of parallel tiles with an efficient synchronization registry.  ...  It takes advantage of polyhedral analysis and transformation in the form of PLUTO[6], combined with a highly optimized fine grain tile runtime to exploit parallelism at all levels.  ...  Using its affine transformation framework, statement wise transformations are done to minimize communication across boundaries.  ... 
doi:10.1007/978-3-319-17473-0_11 fatcat:z4mrrvucyrecvezcdvdkw4xlrq

Finding Free Schedules for Non-uniform Loops [chapter]

Volodymyr Beletskyy, Krzysztof Siedlecki
2003 Lecture Notes in Computer Science  
Algorithms, permitting us to build free schedules for perfectly and imperfectly nested affine loops with non-uniform dependences, are presented.  ...  This allows us to extract maximum loop parallelism. The algorithms require exact dependence analysis.  ...  Introduction The larger number of transformations have been developed to expose parallelism in loops, minimize synchronization, and improve memory locality in the past, for example, [3] - [10] , [14  ... 
doi:10.1007/978-3-540-45209-6_44 fatcat:g6f7gmn6prhg7crlnlmkkgcij4

Automatic Extraction of Parallelism for Mobile Devices

Marek PAŁKOWSKI
2015 Przeglad Elektrotechniczny  
The loops are parallelized and transformed to multi-threaded application for the Android OS.  ...  Experimental results are carried out by means of the benchmark suites (UTDSP and NPB) using an ARM quad core processor. Performance benefits and power consumption are studied.  ...  However, the affine transformation framework does not exploit all parallelism with synchronization-free slices in some cases of loops [4] .  ... 
doi:10.15199/48.2015.11.40 fatcat:exw6tgyphrailb4ipinvpanexa

Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Louis-Noel Pouchet, Uday Bondhugula, Cedric Bastoul, Albert Cohen, J. Ramanujam, P. Sadayappan
2010 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis  
tiling, vectorization and parallelization on the transformed program.  ...  Today's multi-core era places significant demands on an optimizing compiler, which must parallelize programs, exploit memory hierarchy, and leverage the ever-increasing SIMD capabilities of modern processors  ...  National Science Foundation through awards 0926687/0926688, and by the U.S. Army through contract W911NF-10-1-0004.  ... 
doi:10.1109/sc.2010.14 dblp:conf/sc/PouchetBBCRS10 fatcat:wtvly4ercbai3lo6t7ry4f7tee

Exploring the Impact of Affine Loop Transformations in Qubit Allocation [article]

Martin Kong
2020 arXiv   pre-print
We conduct an extensive evaluation spanning 8 quantum circuits taken from the literature, 3 distinct coupling graphs, 4 affine transformations (including the Pluto dependence distance minimization and  ...  In this paper we explore the synergies and impact of affine loop transformations in the context of qubit allocation and mapping.  ...  ) to minimizing the maximal-dependence distance.  ... 
arXiv:2010.11999v1 fatcat:resho5oxnrf2blaymhw2lfnw44

A practical automatic polyhedral parallelizer and locality optimizer

Uday Bondhugula, Albert Hartono, J. Ramanujam, P. Sadayappan
2008 Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation - PLDI '08  
one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations.  ...  Experimental results from the tool show very high performance for local and parallel execution on multi-cores, when compared with state-of-the-art compiler frameworks from the research community as well  ...  National Science Foundation through grants 0121676, 0121706, 0403342, 0508245, 0509442, 0509467, and 0541409.  ... 
doi:10.1145/1375581.1375595 dblp:conf/pldi/BondhugulaHRS08 fatcat:oxeykavud5fqffeswz3o7k5ote

A practical automatic polyhedral parallelizer and locality optimizer

Uday Bondhugula, Albert Hartono, J. Ramanujam, P. Sadayappan
2008 SIGPLAN notices  
one driven by an integer linear optimization framework that takes an explicit view of finding good ways of tiling for parallelism and locality using affine transformations.  ...  Experimental results from the tool show very high performance for local and parallel execution on multi-cores, when compared with state-of-the-art compiler frameworks from the research community as well  ...  National Science Foundation through grants 0121676, 0121706, 0403342, 0508245, 0509442, 0509467, and 0541409.  ... 
doi:10.1145/1379022.1375595 fatcat:mx5tqjwvdzfelgf4j7rrwb7ojm

Loop parallelization in the polytope model [chapter]

Christian Lengauer
1993 Lecture Notes in Computer Science  
These transformations have a very intuitive interpretation and can be easily quantified and automated due to their mathematical foundation in linear programming and linear algebra.  ...  With the recent availability of massively parallel computers, the idea of loop parallelization is gaining significance, since it promises execution speed-ups of orders of magnitude.  ...  Thanks to Lothar Thiele, Patrice Quinton, Ed Deprettere and Vincent van Dongen for discussions of loop parallelization.  ... 
doi:10.1007/3-540-57208-2_28 fatcat:oe7cdnlnizestae6bpnda6epum

Parallel Code Generation for Mobile Devices

Marek PALKOWSKI
2015 Przeglad Elektrotechniczny  
The loops are parallelized and transformed to multi-threaded application for the Android OS.  ...  Experimental results are carried out by means of the benchmark suites (UTDSP and NPB) using the ARM dual core processor.  ...  However, the affine transformation framework does not exploit all parallelism with synchronization-free slices in some cases of loops [2] .  ... 
doi:10.15199/48.2015.02.31 fatcat:w632sybclfhixllsprdmegb6im
« Previous Showing results 1 — 15 out of 11,266 results