23,820 Hits in 5.8 sec

Programming Heterogeneous Multicore Systems Using Threading Building Blocks [chapter]

George Russell, Paul Keir, Alastair F. Donaldson, Uwe Dolinsky, Andrew Richards, Colin Riley
2011 Lecture Notes in Computer Science  
Codeplay's Offload C++ provides a single-source, POSIX threads-like approach to programming heterogeneous multicore devices where cores are equipped with private, local memories-code to move data between  ...  memory spaces is generated automatically.  ...  The compiler ensures that access to data declared in host memory results in the generation of appropriate data-movement code. The primary mechanism for data-movement on Cell is DMA.  ... 
doi:10.1007/978-3-642-21878-1_15 fatcat:rdp2ruwwmffnravrse322kzxku

A systems perspective on GPU computing

Naila Farooqui
2016 Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit - GPGPU '16  
To this end, his contributions include novel scheduling and resource management abstractions, runtime specialization, and novel data management techniques to support scalable, distributed GPU frameworks  ...  In this paper, we summarize his legacy of key research contributions in general-purpose GPU computing.  ...  Acknowledgments We would like to thank Professor Sudhakar Yalamanchili, Ada Gavrilovska, Vishakha Gupta, Sudarsun Kannan, Alexander Merritt, and Dipanjan Sengupta for their feedback and assistance with  ... 
doi:10.1145/2884045.2884057 dblp:conf/ppopp/Farooqui16 fatcat:lcxhf6nfsvannnbp5lusxudmmu

Runtime-Aware Architectures: A First Approach

2014 Supercomputing Frontiers and Innovations  
With the irruption of multi-cores and parallel applications, this simple interface started to leak.  ...  In this paper, we introduce a first approach towards a Runtime-Aware Architecture (RAA), a massively parallel architecture designed from the runtime's perspective.  ...  We would like to thank Alex Ramirez, Osman Unsal, Adrian Cristal, Mario Nemirovsky, Ramon Beivide, Alejandro Rico and all the RoMoL team for the prolific discussions and all the feedback that we received  ... 
doi:10.14529/jsfi140102 fatcat:4bh33566cfbz7iylsf2ufppsfa

Framework For Efficient Cosimulation And Fast Prototyping On Multi-Components With Aaa Methodology: Lar Codec Study Case

Erwan Flécher, Mickael Raulet, Ghislain Roquier, Marie Babel, O. Deforges
2007 Zenodo  
In our framework, a generic matlab function for data read or display can be easily included in a C application thanks to automatic code generation.  ...  It should be noted that, data transfers between processes are automatically generated and synchronized by SynDEx using automatic code generation.  ... 
doi:10.5281/zenodo.40543 fatcat:rf3kwvaherfv5mckeubyqojiru

Hierarchical Dataflow Model for efficient programming of clustered manycore processors

Julien Hascoet, Karol Desnos, Jean-Francois Nezan, Benoit Dupont de Dinechin
2017 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)  
Programming Multiprocessor Systems-on-Chips (MPSoCs) with hundreds of heterogeneous Processing Elements (PEs), complex memory architectures, and Networks-on-Chips (NoCs) remains a challenge for embedded  ...  This paper introduces a technique for deploying hierarchical dataflow graphs efficiently onto MPSoC.  ...  The automatic optimization provided with code generation allows for the limitation of data movement that are both very time and power consuming.  ... 
doi:10.1109/asap.2017.7995270 dblp:conf/asap/HascoetDND17 fatcat:zrl6ffctonhrzixhzhh35nbzdy

An Efficient Stream Buffer Mechanism for Dataflow Execution on Heterogeneous Platforms with GPUs

Ana Balevic, Bart Kienhuis
2011 2011 First Workshop on Data-Flow Execution Models for Extreme Scale Computing  
and data parallelism), but also has a potential to become an effective solution for reducing I/O overheads.  ...  We approach mapping of streaming applications onto heterogeneous architectures using a Process Network (PN) model of computation.  ...  We implemented the SB using a distributed memory approach with double buffering.  ... 
doi:10.1109/dfm.2011.10 fatcat:uj75oy3rs5ftjagajms5ujmlrq

Application Performance on a Cluster-Booster System

Anke Kreuzer, Norbert Eicker, Jorge Amaya, Estela Suarez
2018 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
They evolve around an innovative concept for heterogeneous systems: the Cluster-Booster architecture. In it, a general purpose cluster is tightly coupled to a many-core system (the Booster).  ...  This paper presents for the first time measurements done by a real world scientific application demonstrating the performance gain achieved with this kind of code-partition approach.  ...  Nüssle from EXTOLL GmbH for the MPI benchmarks on the Tourmalet network ( figure 3) . Part of the research presented here has received funding from the European Community's Seventh  ... 
doi:10.1109/ipdpsw.2018.00019 dblp:conf/ipps/KreuzerEAS18 fatcat:c6bvkm47uvdtxgvmaxixosrntu

A Holistic Dataflow-Inspired System Design

Stephane Zuckerman, Haitao Wei, Guang R. Gao, Howard Wong, Jean-Luc Gaudiot, Ahmed Louri
2014 2014 Fourth Workshop on Data-Flow Execution Models for Extreme Scale Computing  
To meet these challenges, heterogeneity in design, both at the architecture and technology levels, will be the prevailing approach for energy efficient computing as specialized cores, accelerators, etc  ...  Thus, data locality, already a must-have in high-performance computing, will become even more critical as memory technology progresses.  ...  be placed in the shared memory hierarchy in order to exploit locality and minimize data movement.  ... 
doi:10.1109/dfm.2014.16 fatcat:s27c5pidjvawjgb3jeuafxpoga

Exascale Machines Require New Programming Paradigms and Runtimes

2015 Supercomputing Frontiers and Innovations  
Innovations future systems, the main changing factor will be the substantially higher levels of concurrency, asynchrony, failures and heterogeneous architectures.  ...  The second section describes how data shift the paradigm of processor-centric management toward a data-centric one in next generation systems.  ...  distributed heterogeneous architectures. libWater aims to improve both productivity and implementation efficiency when parallelizing an application targeting a heterogeneous platform by achieving two  ... 
doi:10.14529/jsfi150201 fatcat:ozj4czefxrd37j7djcxuukyuee

TANGO: Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation [article]

Karim Djemame and Django Armstrong and Richard Kavanagh and Jean-Christophe Deprez and Ana Juan Ferrer and David Garcia Perez and Rosa Badia and Raul Sirvent and Jorge Ejarque and Yiannis Georgiou
2016 arXiv   pre-print
The paper is concerned with the issue of how software systems actually use Heterogeneous Parallel Architectures (HPAs), with the goal of optimizing power consumption on these resources.  ...  To do so, a reference architecture to support energy efficiency at application construction, deployment, and operation is discussed, as well as its implementation and evaluation plans.  ...  Acknowledgments This work is partly supported by the European Commission under H2020-ICT-20152 contract 687584 -Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation (  ... 
arXiv:1603.01407v1 fatcat:3yjffrybxfbondmjgq5vy5fjd4

Towards an Energy-Aware Framework for Application Development and Execution in Heterogeneous Parallel Architectures [chapter]

Karim Djemame, Richard Kavanagh, Vasilios Kelefouras, Adrià Aguilà, Jorge Ejarque, Rosa M. Badia, David García Pérez, Clara Pezuela, Jean-Christophe Deprez, Lotfi Guedria, Renaud De Landtsheer, Yiannis Georgiou
2018 Hardware Accelerators in Data Centers  
Moreover, a programming model with built-in support for various hardware architectures including heterogeneous clusters, heterogeneous chips and programmable logic devices is provided.  ...  operation for Heterogeneous Parallel Hardware (HPA) environments.  ...  tools to optimize various dimensions of software design and operations (energy efficiency, performance, data movement and location, cost, time-criticality, security, dependability on target architectures  ... 
doi:10.1007/978-3-319-92792-3_7 fatcat:hjiue3alirhvxmdwcsz7amlese

Encapsulated Synchronization and Load-Balance in Heterogeneous Programming [chapter]

Yuri Torres, Arturo Gonzalez-Escribano, Diego Llanos
2012 Lecture Notes in Computer Science  
We show with an example how to produce a parallel code that can be used to efficiently run on systems ranging from a Beowulf cluster to a machine with mixed GPUs.  ...  . (4) We show with an example how to produce a single parallel code that adapts the computation to efficiently run on systems ranging from a Beowulf cluster to a machine with mixed GPUs.  ...  The intrinsic complexity of the code generation for heterogeneous systems increases every time we add any different hardware device.  ... 
doi:10.1007/978-3-642-32820-6_50 fatcat:tbcuebblrnawtbmuhcdww4zkci

Platform-based software design flow for heterogeneous MPSoC

Katalin Popovici, Xavier Guerin, Frederic Rousseau, Pier Stanislao Paolucci, Ahmed Amine Jerraya
2008 ACM Transactions on Embedded Computing Systems  
We applied this approach on a multimedia platform, involving a high performance DSP and a RISC processor, to explore communication architecture and generate an efficient executable code for a multimedia  ...  Programming these architectures usually results in writing separate low-level code for the different processors (DSP, microcontroller), implying late global validation of the overall application with the  ...  CONCLUSION In this article, we presented a platform-based software design flow allowing efficient software code generation and validation for architectures including heterogeneous MPSoC with specific I  ... 
doi:10.1145/1376804.1376807 fatcat:pacsjturcjfxdmafbhjhcnfqya

A memory heterogeneity-aware runtime system for bandwidth-sensitive HPC applications

Kavitha Chandrasekar, Xiang Ni, Laxmikant V. Kale
2017 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
We implement a data movement mechanism managed by the runtime system which allows applications to run efficiently on architectures with heterogeneous memory hierarchy, with trivial code changes.  ...  In architectures with heterogeneity in memory types within a node, efficient allocation and data movement can result in improved performance and energy savings in future systems if all the data requests  ...  We implement a data movement mechanism managed by the runtime system which allows applications to run efficiently on architectures with heterogeneous memory hierarchy, with trivial code changes.  ... 
doi:10.1109/ipdpsw.2017.168 dblp:conf/ipps/ChandrasekarNK17 fatcat:tm2vcyo7qbe2rdtzimhfv2o47q

Parallel Programming Models for Heterogeneous Many-Cores : A Survey [article]

Jianbin Fang, Chun Huang, Tao Tang, Zheng Wang
2020 arXiv   pre-print
In this article, we provide a comprehensive survey for parallel programming models for heterogeneous many-core architectures and review the compiling techniques of improving programmability and portability  ...  While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential can only be unlocked if the application programs are suitably parallel and can be made to  ...  By orchestrating the compiler and the runtime system, the proposed system can efficiently manage the necessary data movements among multiple GPUs memories.  ... 
arXiv:2005.04094v1 fatcat:e2psrdnyajh3hih3znnjjbezae
« Previous Showing results 1 — 15 out of 23,820 results