Filters








42,282 Hits in 5.1 sec

Compiling dynamic mappings with array copies

Fabien Coelho
1997 SIGPLAN notices  
This paper describes techniques to handle dynamic mappings through simple array copies: array r e m a ppings are translated into copies between statically mapped distinct versions of the array.  ...  These techniques are implemented in our prototype HPF compiler.  ...  The idea is to translate a program with dynamic mappings into a standard HPF program with copies between di erently mapped arrays, as outlined in Figure Condition 1 is illustrated in Figure 5 : Array  ... 
doi:10.1145/263767.263786 fatcat:5l7paqd37vfhheuzqikjtpnh2a

Compiling dynamic mappings with array copies

Fabien Coelho
1997 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '97  
This paper describes techniques to handle dynamic mappings through simple array copies: array r e m a ppings are translated into copies between statically mapped distinct versions of the array.  ...  These techniques are implemented in our prototype HPF compiler.  ...  The idea is to translate a program with dynamic mappings into a standard HPF program with copies between di erently mapped arrays, as outlined in Figure Condition 1 is illustrated in Figure 5 : Array  ... 
doi:10.1145/263764.263786 dblp:conf/ppopp/Coelho97 fatcat:jegwml37ovdnlhrhtf2wns5wv4

Kokkos Array performance-portable manycore programming model

H. Carter Edwards, Daniel Sunderland
2012 Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '12  
The Kokkos Array API uses C++ template meta-programming to, at compile time, transparently insert device-optimal data access maps into computational kernels.  ...  With this programming model computational kernels can be written once and, without modification, performance-portably compiled to multicore-CPU and manycore-accelerator devices.  ...  Multidimensional Array Map. A multidimensional array maps its multi-index space to its data members with a one-to-one mapping.  ... 
doi:10.1145/2141702.2141703 dblp:conf/ppopp/EdwardsS12 fatcat:m43m533rr5bqzhrna454yrdrgi

Automatic CPU-GPU communication management and optimization

Thomas B. Jablin, Prakash Prabhu, James A. Jablin, Nick P. Johnson, Stephen R. Beard, David I. August
2011 Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation - PLDI '11  
depending on the strength of static compile-time analyses or on programmer-supplied annotations.  ...  This system, called the CPU-GPU Communication Manager (CGCM), consists of a run-time library and a set of compiler transformations that work together to manage and optimize CPU-GPU communication without  ...  When copying heap or stack allocation units to the GPU, map dynamically allocates GPU memory, but global variables must be copied into their associated named regions.  ... 
doi:10.1145/1993498.1993516 dblp:conf/pldi/JablinPJJBA11 fatcat:4pvn32hwuvatvn7cth6qimczlq

Dynamically managed data for CPU-GPU architectures

Thomas B. Jablin, James A. Jablin, Prakash Prabhu, Feng Liu, David I. August
2012 Proceedings of the Tenth International Symposium on Code Generation and Optimization - CHO '12  
By replacing static analyses with a dynamic run-time system, DyManD overcomes the performance limitations of alias analysis and enables management for complex and recursive data-structures.  ...  This paper presents Dynamically Managed Data (DyManD), the first automatic system to manage complex and recursive data-structures without static analyses.  ...  DyManD consists of a run-time library and a set of compiler passes.  ... 
doi:10.1145/2259016.2259038 dblp:conf/cgo/JablinJPLA12 fatcat:hzbom6p5cfa6znrca63murihru

Automatic CPU-GPU communication management and optimization

Thomas B. Jablin, Prakash Prabhu, James A. Jablin, Nick P. Johnson, Stephen R. Beard, David I. August
2012 SIGPLAN notices  
depending on the strength of static compile-time analyses or on programmer-supplied annotations.  ...  This system, called the CPU-GPU Communication Manager (CGCM), consists of a run-time library and a set of compiler transformations that work together to manage and optimize CPU-GPU communication without  ...  When copying heap or stack allocation units to the GPU, map dynamically allocates GPU memory, but global variables must be copied into their associated named regions.  ... 
doi:10.1145/2345156.1993516 fatcat:fpjzdtdj2zhsfog5huiqf7dhcq

Automatic CPU-GPU communication management and optimization

Thomas B. Jablin, Prakash Prabhu, James A. Jablin, Nick P. Johnson, Stephen R. Beard, David I. August
2011 SIGPLAN notices  
depending on the strength of static compile-time analyses or on programmer-supplied annotations.  ...  This system, called the CPU-GPU Communication Manager (CGCM), consists of a run-time library and a set of compiler transformations that work together to manage and optimize CPU-GPU communication without  ...  When copying heap or stack allocation units to the GPU, map dynamically allocates GPU memory, but global variables must be copied into their associated named regions.  ... 
doi:10.1145/1993316.1993516 fatcat:uypiwdc4efennonakcpor3koay

A framework for integrating data alignment, distribution, and redistribution in distributed memory multiprocessors

J. Garcia, E. Ayguade, J. Labarta
2001 IEEE Transactions on Parallel and Distributed Systems  
The data layout strategy generated is optimal according to our current cost and compilation models.  ...  AbstractÐParallel architectures with physically distributed memory provide a cost-effective scalability to solve many large scale scientific problems.  ...  In this case, the mapping is said to be dynamic. Note that a dynamic data mapping requires data movement to reorganize the data layout between code blocks.  ... 
doi:10.1109/71.920590 fatcat:kblo63hpfvd2hmbevrehzw2wxe

Runtime compilation techniques for data partitioning and communication schedule reuse

R. Ponnusamy, J. Saltz, A. Choudhary
1993 Proceedings of the 1993 ACM/IEEE conference on Supercomputing - Supercomputing '93  
The first mechanism invokes a user specified mapping procedure via a set of compiler directives.  ...  In this paper, we describe two new ideas by which HPF compiler can deal with irregular computations effectively.  ...  authors would like to thank Geoffrey Fox, The authors would also like to gratefully acknowledge the help of Zeki Bozkus and Tom Haupt and the time they spent orienting us to internals of the Fortran 90D compiler  ... 
doi:10.1145/169627.169752 dblp:conf/sc/PonnusamySC93 fatcat:lhrszhdlrbhyzkgoztt4vckd6i

An efficient implementation of SELF a dynamically-typed object-oriented language based on prototypes

C. Chambers, D. Ungar, E. Lee
1989 SIGPLAN notices  
The representation of a map is similar. Map objects begin with mark and map words. AU map objects share the same map, called the "map map." The map map is its own map.  ...  , and array accesses, with their hard-wired definitions.  ... 
doi:10.1145/74878.74884 fatcat:ty7jkbxjkvffdj2z2cx2oq43oq

An efficient implementation of SELF a dynamically-typed object-oriented language based on prototypes

C. Chambers, D. Ungar, E. Lee
1989 Conference proceedings on Object-oriented programming systems, languages and applications - OOPSLA '89  
The representation of a map is similar. Map objects begin with mark and map words. AU map objects share the same map, called the "map map." The map map is its own map.  ...  byte array Each object begins with two header words.  ...  All times are in milliseconds of CPU time, except for the Smalltalk times, which are in milliseconds of real time; the real time measurements for the SELF system and the compiled C program are practically  ... 
doi:10.1145/74877.74884 dblp:conf/oopsla/ChambersUL89 fatcat:rzj4rs6tkjajfnwozyatcbbdly

An efficient implementation of SELF, a dynamically-typed object-oriented language based on prototypes

Craig Chambers, David Ungar, Elgin Lee
1991 LISP and Symbolic Computation  
The representation of a map is similar. Map objects begin with mark and map words. AU map objects share the same map, called the "map map." The map map is its own map.  ...  byte array Each object begins with two header words.  ...  All times are in milliseconds of CPU time, except for the Smalltalk times, which are in milliseconds of real time; the real time measurements for the SELF system and the compiled C program are practically  ... 
doi:10.1007/bf01806108 fatcat:s2fff5ue5zhofcog3lwakfhuz4

CUDA-NP

Yi Yang, Huiyang Zhou
2014 SIGPLAN notices  
However, with dynamic parallelism, a parent thread can only communicate with its child threads through global memory and the overhead of launching GPU kernels is non-trivial even within GPUs.  ...  Then, our CUDA-NP compiler automatically generates the optimized GPU kernels.  ...  So, in Section 3.3, we discuss our compiler transformation to deal with live array-variables, which are located in local memory.  ... 
doi:10.1145/2692916.2555254 fatcat:kxedfqo55fgrdoxghfjjbi27tu

CUDA-NP: Realizing Nested Thread-Level Parallelism in GPGPU Applications

Yi Yang, Chao Li, Huiyang Zhou
2015 Journal of Computer Science and Technology  
However, with dynamic parallelism, a parent thread can only communicate with its child threads through global memory and the overhead of launching GPU kernels is non-trivial even within GPUs.  ...  Then, our CUDA-NP compiler automatically generates the optimized GPU kernels.  ...  So, in Section 3.3, we discuss our compiler transformation to deal with live array-variables, which are located in local memory.  ... 
doi:10.1007/s11390-015-1500-y fatcat:42baxdr3hbf6tnxgo25ycfp2c4

CUDA-NP

Yi Yang, Huiyang Zhou
2014 Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming - PPoPP '14  
However, with dynamic parallelism, a parent thread can only communicate with its child threads through global memory and the overhead of launching GPU kernels is non-trivial even within GPUs.  ...  Then, our CUDA-NP compiler automatically generates the optimized GPU kernels.  ...  So, in Section 3.3, we discuss our compiler transformation to deal with live array-variables, which are located in local memory.  ... 
doi:10.1145/2555243.2555254 dblp:conf/ppopp/YangZ14 fatcat:ts2yttcyzndrliod625rkzjyty
« Previous Showing results 1 — 15 out of 42,282 results