Filters








916 Hits in 2.7 sec

Embedded Runtime for Reconfigurable Dataflow Graphs on Manycore Architectures

Hugo Miomandre, Julien Hascoët, Karol Desnos, Kevin J. M. Martin, Benoît Dupont de Dinechin Kalray, Jean-François Nezan
2018 Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms - PARMA-DITAM '18  
An open-source implementation on the Kalray MPPA R processor demonstrates the feasibility and the great potential of such a runtime.  ...  This paper introduces the first embedded runtime manager enabling the execution of reconfigurable dataflow graphs on a Non-Uniform Memory Access (NUMA) architecture.  ...  The software component responsible for managing the graph reconfigurations is called a dataflow runtime.  ... 
doi:10.1145/3183767.3183780 dblp:conf/hipeac/MiomandreHDMDN18 fatcat:sj5437ax2jgcrdkhv6ss4o4dom

Towards an Automatic Co-generator for Manycores' Architecture and Runtime: STHORM case-study

Charly Bechara, Karim Ben Chehida, Farhat Thabet
2015 Procedia Computer Science  
This part is highly parallelizable by sub-images and dynamic, thus can be run on multiple processors. This is a promising property for the DSE.  ...  The SW runtime used in this study [4] is a set of libraries (communication, execution engines, synchronization, resource management…) where the resource management library is built on top of the HDS  ... 
doi:10.1016/j.procs.2015.05.439 fatcat:unabgl3c6jh3lnob72525pr47q

Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures [chapter]

Yonghong Yan, Sanjay Chatterjee, Daniel A. Orozco, Elkin Garcia, Zoran Budimlić, Jun Shirako, Robert S. Pavel, Guang R. Gao, Vivek Sarkar
2011 Lecture Notes in Computer Science  
This paper describes the implementation a high-level synchronization construct called phasers on the IBM Cyclops64 manycore processor, and compares phasers to lower-level synchronization primitives currently  ...  To take advantage of this massive parallelism in practice requires a productive parallel programming model, and an efficient runtime for the scheduling and coordination of concurrent tasks.  ...  Acknowledgments We wish to thank Vincent Cavé and Joshua Landwehr for their hard work on the correctness, performance and efficiency of the Habanero-C runtime.  ... 
doi:10.1007/978-3-642-23397-5_12 fatcat:gllhxbr45ncqbgzp5x5gq4biji

Towards software performance engineering for multicore and manycore systems

Heiko Koziolek, Steffen Becker, Jens Happe, Petr Tuma, Thijmen de Gooijer
2014 Performance Evaluation Review  
In the era of multicore and manycore processors, a systematic engineering approach for software performance becomes more and more crucial to the success of modern software systems.  ...  This article argues for more software performance engineering research specifically for multicore and manycore systems, which will have a profound impact on software engineering practices.  ...  Acknowledgements We are grateful to the team at Schloss Dagstuhl for hosting the seminar that led to this paper. We thank all participants for their contributions and the engaging discussions.  ... 
doi:10.1145/2567529.2567531 fatcat:j24fiw7ulrfedc3vmce7kcy3ri

A Distributed Framework for Low-Latency OpenVX over the RDMA NoC of a Clustered Manycore

Julien Hascoe, Benoet Dupont de Dinechin, Karol Desnos, Jean-Francois Nezan
2018 2018 IEEE High Performance extreme Computing Conference (HPEC)  
This framework is implemented and evaluated on the 2nd-generation Kalray MPPA R clustered manycore processor.  ...  While highly efficient OpenVX implementations exist for shared memory multi-core processors, targeting OpenVX to clustered manycore processors appears challenging.  ...  Efficient programming of such manycore processors is challenging, as application software must distribute processing on the clusters and use the local memories as scratch-pad.  ... 
doi:10.1109/hpec.2018.8547736 dblp:conf/hpec/HascoetDDN18 fatcat:ri7iejxc2ffmhk3sntlilwxniq

Partially Separated Page Tables for Efficient Operating System Assisted Hierarchical Memory Management on Heterogeneous Architectures

B. Gerofi, A. Shimada, A. Hori, Y. Ishikawa
2013 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing  
We provide experimental results on stencil computation, a common HPC kernel, and show that OS assisted memory management has the potential for scalable transparent data movement.  ...  In this paper we propose an application transparent, operating system (OS) assisted hierarchical memory management system, where the OS orchestrates data movement between the host and the device and updates  ...  We would like to express our gratitude to Intel Japan for providing the hardware, software and technical support associated with the Intel R ⃝ Xeon Phi TM product family.  ... 
doi:10.1109/ccgrid.2013.59 dblp:conf/ccgrid/GerofiSHI13 fatcat:db6bcbjxpvhdflg7eh254tc5pa

Open Tiled Manycore System-on-Chip [article]

Stefan Wallentowitz, Philipp Wagner, Michael Tempelmeier, Thomas Wild, Andreas Herkersdorf
2013 arXiv   pre-print
Manycore System-on-Chip include an increasing amount of processing elements and have become an important research topic for improvements of both hardware and software.  ...  With the Open Tiled Manycore System-on-Chip (OpTiMSoC) we aim at building such an environment for use in our and other research projects as prototyping platform.  ...  Lean runtime system Based on the runtime system a simple runtime system is implemented that provides the central functions needed for more sophisticated systems: a thread scheduler and virtual memory management  ... 
arXiv:1304.5081v1 fatcat:ffcgqiahivggzme5v2ggf2y6um

Extended Cyclostatic Dataflow Program Compilation and Execution for an Integrated Manycore Processor

Pascal Aubry, Pierre-Edouard Beaucamps, Frédéric Blanc, Bruno Bodin, Sergiu Carpov, Loïc Cudennec, Vincent David, Philippe Dore, Paul Dubrulle, Benoît Dupont de Dinechin, François Galea, Thierry Goubier (+8 others)
2013 Procedia Computer Science  
In this context, we present a compilation toolchain for the ΣC language, which allows the hierarchical construction of stream applications and automatic mapping of this application to an embedded manycore  ...  As a demonstration of this toolchain, we present an implementation of a H.264 encoder and evaluate its performance on Kalray's embedded manycore MPPA chip.  ...  These I/O clusters are in charge of managing I/O data exchanges between either external buses (e.g. PCIe) or SDRAM. As other clusters they have a local processor for management and interface.  ... 
doi:10.1016/j.procs.2013.05.330 fatcat:4kw6fqbkubdhtnwptin2fai7aq

Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors

Ananya Muddukrishna, Peter A. Jonsson, Mats Brorsson
2015 Scientific Programming  
We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors.  ...  Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors.  ...  Acknowledgments The work was partially funded by European FP7 project ENCORE Project Grant Agreement no. 248647 and Artemis PaPP Project no. 295440.  ... 
doi:10.1155/2015/981759 fatcat:r6gun4qwdfbezg4nnx63zmo7nm

Hierarchical Dataflow Model for efficient programming of clustered manycore processors

Julien Hascoet, Karol Desnos, Jean-Francois Nezan, Benoit Dupont de Dinechin
2017 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)  
Programming Multiprocessor Systems-on-Chips (MPSoCs) with hundreds of heterogeneous Processing Elements (PEs), complex memory architectures, and Networks-on-Chips (NoCs) remains a challenge for embedded  ...  Dataflow Models of Computation (MoCs) are increasingly used for developing parallel applications as their high-level of abstraction eases the automation of mapping, task scheduling and memory allocation  ...  The hierarchy feature of the IBSDF MoC is used for the mapping of computation on PEs, for code generation, and for efficient management of off-and on-chip communications.  ... 
doi:10.1109/asap.2017.7995270 dblp:conf/asap/HascoetDND17 fatcat:zrl6ffctonhrzixhzhh35nbzdy

Kokkos: Enabling Performance Portability Across Manycore Architectures

H. Carter Edwards, Christian R. Trott
2013 2013 Extreme Scaling Workshop (xsw 2013)  
Results presented in this paper are for pre-production Intel Xeon Phi co-processors (codenamed Knights Corner) and pre-production versions of Intel's Xeon Phi software stack.  ...  The two foundational abstractions of Kokkos are (1) dispatch work to a manycore device for parallel execution and (2) manage multidimensional arrays with polymorphic layouts.  ...  ACKNOWLEDGMENT Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S.  ... 
doi:10.1109/xsw.2013.7 fatcat:p6t6h3yfxraetjz5wktlsn7uwu

Experience on optimizing irregular computation for memory hierarchy in manycore architecture

Guangming Tan, Dongrui Fan, Junchao Zhang, Andrew Russo, Guang R. Gao
2008 Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08  
Percolation for creating locality just in time achieves sub-linear speedup on C64 with software-managed memory hierarchy.  ...  Lock-based consistency for managing irregular producer-consumer coherence also obtains reasonable scalability on GodsonT with hardware-managed memory hierarchy.  ... 
doi:10.1145/1345206.1345255 dblp:conf/ppopp/TanFZRG08 fatcat:lhzkf4hz7zdgxl4nkdcndf56z4

Hosting an object heap on manycore hardware

David Ungar, Sam S. Adams
2009 Proceedings of the 5th symposium on Dynamic languages - DLS '09  
Our design relies on an object table, and the exploitation of a user-managed caching regime for readmostly objects.  ...  Our system is far from complete, let alone optimal, but our experiences have helped us develop new intuitions needed to rise to the manycore software challenge.  ...  for its passionate advancement and preservation of the original Smalltalk IDE; Leo Ungar for his editing; and Richard Schooler, VP SW Engineering at Tilera, and his team for their excellent support during  ... 
doi:10.1145/1640134.1640149 dblp:conf/dls/UngarA09 fatcat:jzt632nnfzh7jcv7urjibhk3by

HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA [article]

Andreas Kurth, Pirmin Vogel, Alessandro Capotondi, Andrea Marongiu, Luca Benini
2017 arXiv   pre-print
HERO includes a complete software stack that consists of a heterogeneous cross-compilation toolchain with support for OpenMP accelerator programming, a Linux driver, and runtime libraries for both host  ...  Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host processor with programmable manycore accelerators (PMCAs) to combine general-purpose computing with domain-specific, efficient  ...  PULP is an architectural template for scalable, energy-efficient processing that combines an explicitly-managed memory hierarchy, ISA extensions and compiler support for specialized DSP instructions, and  ... 
arXiv:1712.06497v1 fatcat:gwexa42crjb6nceyzieeflquyy

On the Performance and Isolation of Asymmetric Microkernel Design for Lightweight Manycores

Pedro Henrique Penna, Joao Vicente Souto, Davidson Francis Lima, Marcio Castro, Francois Broquedis, Henrique Freitas, Jean-Francois Mehaut
2019 2019 IX Brazilian Symposium on Computing Systems Engineering (SBESC)  
Also, our results unveil co-design aspects between an OS kernel and the architecture of lightweight manycore, concerning the memory system and core grouping.  ...  While several multikernel OS designs are possible, in this work we argue on one that is structured in asymmetric microkernel instances.  ...  a software-managed Memory Management Unit (MMU).  ... 
doi:10.1109/sbesc49506.2019.9046080 dblp:conf/sbesc/PennaSLCBFM19 fatcat:oe576cky2jfp3h4hynrszytmbq
« Previous Showing results 1 — 15 out of 916 results