Filters








2,249 Hits in 3.7 sec

Automatic data and computation partitioning on scalable shared memory multiprocessors [chapter]

Sudarsan Tandri, Tarek S. Abdelrahman
1997 Lecture Notes in Computer Science  
determined by the NUMA-CAG, and dynamic partitions obtained by re-partitioning the array between the two loop nests.  ...  Additionally, the suitability of data and/or computation partitions depends not only on data access patterns, but also on the characteristics of the target multiprocessor.  ... 
doi:10.1007/bfb0017282 fatcat:x4yiviptafayvnjcsl2lh3rvta

SPM-aware scheduling for nested loops in CMP systems

Zhi Chen, Meikang Qiu
2013 ACM SIGBED Review  
BACKGROUND AND RELATED WORK Most prior work on nested loops either focuses on loop partitioning such as tiling, collapsing, and transformation, etc, or focuses on scheduling loops on different processors  ...  2) how to schedule nested loops on different SPMs of each processor in a CMP systems?  ... 
doi:10.1145/2518148.2518151 fatcat:gc2uogmhs5epvcvv7xvxhsakfu

A data locality optimizing algorithm

Monica S. Lam, Michael E. Wolf
2004 SIGPLAN notices  
optimizing locality across arbitrary loop nests based on affine partitioning [6] .  ...  Our paper "A Data Locality Optimizing Algorithm" presented an automatic blocking algorithm for perfect loop nests on uniprocessors and multiprocessors [8] .  ... 
doi:10.1145/989393.989437 fatcat:eajlltosg5gjbhqtdituqhh3hi

A data locality optimizing algorithm

Michael E. Wolf, Monica S. Lam
1991 SIGPLAN notices  
optimizing locality across arbitrary loop nests based on affine partitioning [6] .  ...  Our paper "A Data Locality Optimizing Algorithm" presented an automatic blocking algorithm for perfect loop nests on uniprocessors and multiprocessors [8] .  ... 
doi:10.1145/113446.113449 fatcat:2ovh6wjxmjhorne6ox3dlj47ha

Enhancing the performance of autoscheduling in Distributed Shared Memory multiprocessors [chapter]

Dimitrios S. Nikolopoulos, Eleftherios D. Polychronopoulos, Theodore S. Papatheodorou
1998 Lecture Notes in Computer Science  
scheduling of parallel tasks, and dynamic program adaptability on multiprogrammed shared memory multiprocessors.  ...  Our technique partitions the application Hierarchical Task Graph and maps the derived partitions to clusters of processors in the DSM architecture.  ...  , all our partners in the NANOS project and George Tsolis for his help in improving the appearance of the paper.  ... 
doi:10.1007/bfb0057892 fatcat:sdudtgqxeje3jhwt56fxhzipwq

A compiler framework for optimization of affine loop nests for gpgpus

Muthu Manikandan Baskaran, Uday Bondhugula, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan
2008 Proceedings of the 22nd annual international conference on Supercomputing - ICS '08  
In this paper, a number of issues are addressed towards the goal of developing a compiler framework for automatic parallelization and performance optimization of affine loop nests on GPGPUs: 1) approach  ...  factors for conflict-minimal data access from GPU shared memory; and 3) model-driven empirical search to determine optimal parameters for unrolling and tiling.  ...  National Science Foundation through awards 0121676, 0121706, 0403342, 0508245, 0509442, 0509467 and 0541409.  ... 
doi:10.1145/1375527.1375562 dblp:conf/ics/BaskaranBKRRS08 fatcat:x6rdnmlkvzaw7jfcet3pxzsewi

Fusion of loops for parallelism and locality

N. Manjikian, T.S. Abdelrahman
1997 IEEE Transactions on Parallel and Distributed Systems  
The techniques are evaluated on a 56-processor KSR2 multiprocessor and on a 16-processor Convex SPP-1000 multiprocessor.  ...  In this paper, we present new techniques to: (1) allow fusion of loop nests in the presence of fusion-preventing dependences, (2) maintain parallelism and allow the parallel execution of fused loops with  ...  The use of the KSR2 and Convex SPP-1000 multiprocessors was provided by the University of Michigan Center for Parallel Computing.  ... 
doi:10.1109/71.577265 fatcat:rticunkxsbgpvjoa4hbczr6tei

Data layout optimization for GPGPU architectures

Jun Liu, Wei Ding, Ohyoung Jang, Mahmut Kandemir
2013 Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '13  
Specifically, we try to optimize the layout of the arrays accessed in affine loop nests, for both the device memory and shared memory, at both coarse grain and fine grain parallelization levels.  ...  Our approach employs a widely applicable strategy based on a novel concept called data localization.  ...  Acknowledgments This research is supported in part by NSF grants #1213052, #1152479, #1147388, #1139023, #1017882, #0963839, #0811687 and a grant from Microsoft Corporation.  ... 
doi:10.1145/2442516.2442546 dblp:conf/ppopp/LiuDJK13 fatcat:ldaeu4642zh4lawlctxwbghn7q

On-chip memory space partitioning for chip multiprocessors using polyhedral algebra

O. Ozturk, M.J. Irwin, M. Kandemir
2010 IET Computers & Digital Techniques  
We evaluate the resulting memory configurations using a set of benchmarks and compare them to pure private and pure shared memory on-chip multiprocessor architectures.  ...  One of the most important issues in designing a chip multiprocessor is to decide its on-chip memory organisation.  ...  Acknowledgments This research is supported in part by NSF grants CNS #0720645, CCF #0811687, CCF #0702519, CNS #0202007, CNS #0509251, by a grant from Microsoft Corporation, by a grant from IBM, and by  ... 
doi:10.1049/iet-cdt.2009.0089 fatcat:n4apvjbzsbd4hfex4x7cjer72e

Exploiting wavefront parallelism on large-scale shared-memory multiprocessors

N. Manjikian, T.S. Abdelrahman
2001 IEEE Transactions on Parallel and Distributed Systems  
AbstractÐWavefront parallelism, in which parallelism is limited to hyperplanes in an iteration space, can arise when compilers apply tiling to loop nests to enhance locality.  ...  In this paper, we show that on large-scale shared-memory multiprocessors, locality is a crucial factor.  ...  ACKNOWLEDGMENTS This research was supported by the Natural Sciences and Engineering Research Council of Canada and the Information Technology Research Centre of Ontario.  ... 
doi:10.1109/71.914756 fatcat:2ucmp4mx2vcuxmbyt4mx7ftsey

Automatic data movement and computation mapping for multi-level parallel architectures with explicitly managed memories

Muthu Manikandan Baskaran, Uday Bondhugula, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan
2008 Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08  
We also address the problem of mapping computation in regular programs to multi-level parallel architectures using a multi-level tiling approach, and study the impact of on-chip memory availability on  ...  Several parallel architectures such as GPUs and the Cell processor have fast explicitly managed on-chip memories, in addition to slow off-chip memory.  ...  National Science Foundation through awards 0121676, 0121706, 0403342, 0508245, 0509442, 0509467 and 0541409.  ... 
doi:10.1145/1345206.1345210 dblp:conf/ppopp/BaskaranBKRRS08 fatcat:norktuzqincvjpf3qaa4hlklu4

Design of parallel algorithms for a distributed memory hypercube

EL Zapata, OG Plata, FF Rivera
1992 Microprocessors and microsystems  
On the other hand, the structured model, or SPMD model (singleprogram/multiple-data), attempts to mimic the simplicity in the programming of synchronous parallel systems.  ...  In the unstructured model, or MPMD (multiple-program/multiple-data), each processor will have its own local data and its own local program that will process this data.  ...  ACKNOWLEDGEMENTS This work has been partially supported by grants TIC88-0094, MIC88-0549, MIC90-1264-E and TIC90-0407 of the CICYT and XUGA20604A90 of the Xunta de Galicia.  ... 
doi:10.1016/0141-9331(92)90107-5 fatcat:5ikfflucsbdr3dk5tlq5lnvx6y

OpenMP to GPGPU

Seyong Lee, Seung-Jai Min, Rudolf Eigenmann
2008 Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '09  
Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both  ...  The goal of this translation is to further improve programmability and make existing OpenMP applications amenable to execution on GPGPUs.  ...  Acknowledgments This work was supported, in part, by the National Science Foundation under grants No. 0429535-CCF, CNS-0751153, and 0833115-CCF.  ... 
doi:10.1145/1504176.1504194 dblp:conf/ppopp/LeeME09 fatcat:7ru27sozu5h5hhlni4w4cdx6hi

OpenMP to GPGPU

Seyong Lee, Seung-Jai Min, Rudolf Eigenmann
2009 SIGPLAN notices  
Experimental results from two important kernels (JACOBI and SPMUL) and two NAS OpenMP Parallel Benchmarks (EP and CG) show that the described translator and compile-time optimizations work well on both  ...  The goal of this translation is to further improve programmability and make existing OpenMP applications amenable to execution on GPGPUs.  ...  Acknowledgments This work was supported, in part, by the National Science Foundation under grants No. 0429535-CCF, CNS-0751153, and 0833115-CCF.  ... 
doi:10.1145/1594835.1504194 fatcat:wbpl7ohbzffedndc6s6tafkfny

Efficient runtime thread management for the nano-threads programming model [chapter]

Dimitrios S. Nikolopoulos, Eleftherios D. Polychronopoulos, Theodore S. Papatheodorou
1998 Lecture Notes in Computer Science  
The proposed mechanisms attempt to obtain maximum benefits from data locality on cache-coherent NUMA multiprocessors.  ...  The nano-threads programming model was proposed to effectively integrate multiprogramming on shared-memory multiprocessors, with the exploitation of fine-grain parallelism from standard applications.  ...  and the referees for their helpful comments.  ... 
doi:10.1007/3-540-64359-1_688 fatcat:54eahhbswfb2pe37piqtibzpoi
« Previous Showing results 1 — 15 out of 2,249 results