Filters








1,031 Hits in 6.0 sec

DISC: A Domain-Interaction Based Programming Model with Support for Heterogeneous Execution

Mehmet Can Kurt, Gagan Agrawal
2014 SC14: International Conference for High Performance Computing, Networking, Storage and Analysis  
Subsequently, we develop techniques for the runtime system to automatically partition and re-partition the work among heterogeneous processors or nodes.  ...  We explain how stencil computations, unstructured grid computations, and molecular dynamics applications can be expressed using these simple concepts.  ...  AUTOMATIC REPARTITIONING OF DOMAINS A key aspect of our programming model is automated partitioning of the domain for array-based computations, to allow efficient execution on heterogeneous collection  ... 
doi:10.1109/sc.2014.76 dblp:conf/sc/KurtA14 fatcat:66jbg7a3wreghddvtmgo5xhwhq

High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures

Pei Li, Elisabeth Brunet, Raymond Namyst
2013 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing  
Heterogeneous architectures have been widely used in the domain of high performance computing.  ...  The experiment of iterative stencil loop code (ISL) shows that our tool is efficient. It guarantees the minimum data exchanges and achieves high performance on heterogeneous multi-device architecture.  ...  Moreover, many heterogeneous systems start having multiple computing devices.  ... 
doi:10.1109/hpcc.and.euc.2013.213 dblp:conf/hpcc/LiBN13 fatcat:s7dazugberfibp5etssrypojny

An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers

Jason Cong, Peng Li, Bingjun Xiao, Peng Zhang
2016 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
Prior work on memory partitioning of data reuse buffers is limited to uniform partitioning. In this paper, we perform an early-stage exploration of non-uniform memory partitioning.  ...  We use the stencil computation, a popular communication-intensive application domain, as a case study to show the potential benefits of non-uniform memory partitioning.  ...  This work is partially supported by the Center for Domain-Specific Computing (CDSC), and C-FAR, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA.  ... 
doi:10.1109/tcad.2015.2488491 fatcat:c7cru3kn65g5fmkbfuyjtw42oy

An Optimal Microarchitecture for Stencil Computation Acceleration Based on Non-Uniform Partitioning of Data Reuse Buffers

Jason Cong, Peng Li, Bingjun Xiao, Peng Zhang
2014 Proceedings of the The 51st Annual Design Automation Conference on Design Automation Conference - DAC '14  
Prior work on memory partitioning of data reuse buffers is limited to uniform partitioning. In this paper, we perform an early-stage exploration of non-uniform memory partitioning.  ...  We use the stencil computation, a popular communication-intensive application domain, as a case study to show the potential benefits of non-uniform memory partitioning.  ...  This work is partially supported by the Center for Domain-Specific Computing (CDSC), and C-FAR, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA.  ... 
doi:10.1145/2593069.2593090 dblp:conf/dac/CongLXZ14 fatcat:7yiii4ix7vh3vfdhryt7edjbz4

An optimal microarchitecture for stencil computation acceleration based on non-uniform partitioning of data reuse buffers

Jason Cong, Peng Li, Bingjun Xiao, Peng Zhang
2014 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC)  
Prior work on memory partitioning of data reuse buffers is limited to uniform partitioning. In this paper, we perform an early-stage exploration of non-uniform memory partitioning.  ...  We use the stencil computation, a popular communication-intensive application domain, as a case study to show the potential benefits of non-uniform memory partitioning.  ...  This work is partially supported by the Center for Domain-Specific Computing (CDSC), and C-FAR, one of six centers of STARnet, a Semiconductor Research Corporation program sponsored by MARCO and DARPA.  ... 
doi:10.1109/dac.2014.6881404 fatcat:br2ce6vthnhx7aajk3zctcpg4a

Automatic OpenCL Code Generation for Multi-device Heterogeneous Architectures

Pei Li, Elisabeth Brunet, Francois Trahay, Christian Parrot, Gael Thomas, Raymond Namyst
2015 2015 44th International Conference on Parallel Processing  
, (ii) the performance of an application written using STEPOCL competes with a handwritten version, (iii) larger workloads run on multiple devices that do not fit in the memory of a single device, (iv)  ...  However, writing an application for multiple accelerators remains today challenging because going from a single accelerator to multiple ones indeed requires to deal with potentially nonuniform domain decomposition  ...  STATE OF THE ART As heterogeneous architectures are becoming ubiquitous, many studies have focused on alleviating heterogeneous systems programming.  ... 
doi:10.1109/icpp.2015.105 dblp:conf/icpp/LiBTPTN15 fatcat:xthxevn27bcvlksgnuq7kj6u6i

Have abstraction and eat performance, too: optimized heterogeneous computing with parallel patterns

Kevin J. Brown, HyoukJoong Lee, Tiark Rompf, Arvind K. Sujeeth, Christopher De Sa, Christopher Aberger, Kunle Olukotun
2016 Proceedings of the 2016 International Symposium on Code Generation and Optimization - CGO 2016  
High performance in modern computing platforms requires programs to be parallel, distributed, and run on heterogeneous hardware.  ...  Unfortunately existing systems tend to fall short of these requirements.  ...  This work is supported by DARPA Contract-Air Force, Xgraphs; Language and Algorithms for Heterogeneous Graph Streams,  ... 
doi:10.1145/2854038.2854042 dblp:conf/cgo/BrownLRSSAO16 fatcat:cye5j5gi3vfgzh7xyku5cfttq4

A Pattern Specification and Optimizations Framework for Accelerating Scientific Computations on Heterogeneous Clusters

Linchuan Chen, Xin Huo, Gagan Agrawal
2015 2015 IEEE International Parallel and Distributed Processing Symposium  
Such systems can be extremely hard to program because of the underlying heterogeneity and the need for exploiting parallelism at multiple levels.  ...  By developing APIs for generalized reductions, irregular reductions, and stencil computations, we show that several complex scientific applications can be supported.  ...  Stencil Computations: Stencil applications involve computations of updating each element in the input based on the values of its neighbor elements.  ... 
doi:10.1109/ipdps.2015.13 dblp:conf/ipps/ChenHA15 fatcat:w6nxwycib5byhhaybm76ntdy44

Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations

Lukasz Szustak, Kamil Halbiniak, Roman Wyrzykowski, Ondřej Jakl
2018 Journal of Supercomputing  
This paper meets the challenge of harnessing the heterogeneous communication architecture of ccNUMA multiprocessors for heterogeneous stencil computations, an important example of which is the Multidimensional  ...  We propose a method for optimization of parallel implementation of heterogeneous stencil computations that is a combination of the islands-of-core strategy and (3+1)D decomposition.  ...  Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution  ... 
doi:10.1007/s11227-018-2460-0 fatcat:2g6ta3ve3vd77i76ycveb3u7vu

Methods to Load Balance a GCR Pressure Solver Using a Stencil Framework on Multi- and Many-Core Architectures

Milosz Ciznicki, Michal Kulczewski, Piotr Kopta, Krzysztof Kurowski
2015 Scientific Programming  
Extra care must be taken for communication-intensive algorithms, which may be a bottleneck for forthcoming era of exascale computing.  ...  to benefit from new high-performance computing machines.  ...  Acknowledgments This work is supported by the Polish National Center of Science under Grant no. UMO-2011/03/B/ST6/03500. This research was supported in part by PL-Grid Infrastructure.  ... 
doi:10.1155/2015/648752 fatcat:ocavvk5iajcshorqgimongbnbu

Panda: A Compiler Framework for Concurrent CPU $$+$$ + GPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

Mohammed Sourouri, Scott B. Baden, Xing Cai
2016 International journal of parallel programming  
This paper describes a new compiler framework for heterogeneous 3D stencil computation on GPU clusters.  ...  We have therefore decided to restrict Panda's applicability to 3D stencil computations on structured grids. While  ...  At the moment of writing, Panda is not equipped with a dedicated runtime system that can automatically detect the number of CPU cores available on a target system.  ... 
doi:10.1007/s10766-016-0454-1 fatcat:7qh6satsl5at7h5guohchgjqxa

Adaptive Partitioning for Iterated Sequences of Irregular OpenCL Kernels

Pierre Huchant, Denis Barthou, Marie-Christine Counilh
2018 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)  
We propose in this paper a static/dynamic approach for the execution of an iterated sequence of datadependent kernels on a multi-device heterogeneous architecture.  ...  The method allows to automatically distribute irregular kernels onto multiple devices and tackles, without training, both load balancing and data transfers issues coming from hardware heterogeneity, load  ...  For the first iteration, each kernel is partitioned using a Uniform strategy. For the following ones, the partitioning of each kernel is computed by solving a linear system.  ... 
doi:10.1109/cahpc.2018.8645867 dblp:conf/sbac-pad/HuchantBC18 fatcat:yhqgcjokdrbajai2m7xdfoqsiy

Liszt

Zachary DeVito, Karthik Duraisamy, Eric Darve, Juan Alonso, Pat Hanrahan, Niels Joubert, Francisco Palacios, Stephen Oakley, Montserrat Medina, Mike Barrientos, Erich Elsen, Frank Ham (+1 others)
2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '11  
Heterogeneous computers with processors and accelerators are becoming widespread in scientific computing.  ...  This approach allows Liszt applications to perform within 12% of hand-written C++, scale to large clusters, and experience order-of-magnitude speedups on GPUs.  ...  To reach exascale computing, we will need even more power-efficient platforms, which are likely to use heterogenous architectures. However, programming such systems has proven problematic.  ... 
doi:10.1145/2063384.2063396 dblp:conf/sc/DeVitoJPOMBEHADDAH11 fatcat:4473i227izhkjj7wo6nqteenty

PARTANS

Thibaut Lutz, Christian Fensch, Murray Cole
2013 ACM Transactions on Architecture and Code Optimization (TACO)  
In this article, we focus on abstracting the complexity of multi-GPU programming for stencil computation.  ...  We show that the best strategy depends not only on the stencil operator, problem size, and GPU, but also on the PCI express layout.  ...  GOALS OF THIS WORK The area of autotuning stencil computations on GPU-enabled systems is wide and challenging.  ... 
doi:10.1145/2400682.2400718 fatcat:mmf6rd2ne5h4zn6ynms4yzfpwm

Heterogeneous CPU-GPU Execution of Stencil Applications

Balint Siklosi, Istvan Z Reguly, Gihan R Mudalige
2018 2018 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC)  
Portability across different CPUs and GPUs is becoming paramount, and heterogeneous scheduling of computations is also of increasing interest to make full use of these systems.  ...  In this paper we present research on the hybrid CPU-GPU execution of an important class of applications: structured mesh stencil codes.  ...  Future systems are also likely to be heterogeneous in some form.  ... 
doi:10.1109/p3hpc.2018.00010 fatcat:gfwnxurlmnhezjfv5g2qdliw4u
« Previous Showing results 1 — 15 out of 1,031 results