A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Embedded Runtime for Reconfigurable Dataflow Graphs on Manycore Architectures
2018
Proceedings of the 9th Workshop and 7th Workshop on Parallel Programming and RunTime Management Techniques for Manycore Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms - PARMA-DITAM '18
An open-source implementation on the Kalray MPPA R processor demonstrates the feasibility and the great potential of such a runtime. ...
This paper introduces the first embedded runtime manager enabling the execution of reconfigurable dataflow graphs on a Non-Uniform Memory Access (NUMA) architecture. ...
The software component responsible for managing the graph reconfigurations is called a dataflow runtime. ...
doi:10.1145/3183767.3183780
dblp:conf/hipeac/MiomandreHDMDN18
fatcat:sj5437ax2jgcrdkhv6ss4o4dom
Towards an Automatic Co-generator for Manycores' Architecture and Runtime: STHORM case-study
2015
Procedia Computer Science
This part is highly parallelizable by sub-images and dynamic, thus can be run on multiple processors. This is a promising property for the DSE. ...
The SW runtime used in this study [4] is a set of libraries (communication, execution engines, synchronization, resource management…) where the resource management library is built on top of the HDS ...
doi:10.1016/j.procs.2015.05.439
fatcat:unabgl3c6jh3lnob72525pr47q
Hardware and Software Tradeoffs for Task Synchronization on Manycore Architectures
[chapter]
2011
Lecture Notes in Computer Science
This paper describes the implementation a high-level synchronization construct called phasers on the IBM Cyclops64 manycore processor, and compares phasers to lower-level synchronization primitives currently ...
To take advantage of this massive parallelism in practice requires a productive parallel programming model, and an efficient runtime for the scheduling and coordination of concurrent tasks. ...
Acknowledgments We wish to thank Vincent Cavé and Joshua Landwehr for their hard work on the correctness, performance and efficiency of the Habanero-C runtime. ...
doi:10.1007/978-3-642-23397-5_12
fatcat:gllhxbr45ncqbgzp5x5gq4biji
Towards software performance engineering for multicore and manycore systems
2014
Performance Evaluation Review
In the era of multicore and manycore processors, a systematic engineering approach for software performance becomes more and more crucial to the success of modern software systems. ...
This article argues for more software performance engineering research specifically for multicore and manycore systems, which will have a profound impact on software engineering practices. ...
Acknowledgements We are grateful to the team at Schloss Dagstuhl for hosting the seminar that led to this paper. We thank all participants for their contributions and the engaging discussions. ...
doi:10.1145/2567529.2567531
fatcat:j24fiw7ulrfedc3vmce7kcy3ri
A Distributed Framework for Low-Latency OpenVX over the RDMA NoC of a Clustered Manycore
2018
2018 IEEE High Performance extreme Computing Conference (HPEC)
This framework is implemented and evaluated on the 2nd-generation Kalray MPPA R clustered manycore processor. ...
While highly efficient OpenVX implementations exist for shared memory multi-core processors, targeting OpenVX to clustered manycore processors appears challenging. ...
Efficient programming of such manycore processors is challenging, as application software must distribute processing on the clusters and use the local memories as scratch-pad. ...
doi:10.1109/hpec.2018.8547736
dblp:conf/hpec/HascoetDDN18
fatcat:ri7iejxc2ffmhk3sntlilwxniq
Partially Separated Page Tables for Efficient Operating System Assisted Hierarchical Memory Management on Heterogeneous Architectures
2013
2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing
We provide experimental results on stencil computation, a common HPC kernel, and show that OS assisted memory management has the potential for scalable transparent data movement. ...
In this paper we propose an application transparent, operating system (OS) assisted hierarchical memory management system, where the OS orchestrates data movement between the host and the device and updates ...
We would like to express our gratitude to Intel Japan for providing the hardware, software and technical support associated with the Intel R ⃝ Xeon Phi TM product family. ...
doi:10.1109/ccgrid.2013.59
dblp:conf/ccgrid/GerofiSHI13
fatcat:db6bcbjxpvhdflg7eh254tc5pa
Open Tiled Manycore System-on-Chip
[article]
2013
arXiv
pre-print
Manycore System-on-Chip include an increasing amount of processing elements and have become an important research topic for improvements of both hardware and software. ...
With the Open Tiled Manycore System-on-Chip (OpTiMSoC) we aim at building such an environment for use in our and other research projects as prototyping platform. ...
Lean runtime system Based on the runtime system a simple runtime system is implemented that provides the central functions needed for more sophisticated systems: a thread scheduler and virtual memory management ...
arXiv:1304.5081v1
fatcat:ffcgqiahivggzme5v2ggf2y6um
Extended Cyclostatic Dataflow Program Compilation and Execution for an Integrated Manycore Processor
2013
Procedia Computer Science
In this context, we present a compilation toolchain for the ΣC language, which allows the hierarchical construction of stream applications and automatic mapping of this application to an embedded manycore ...
As a demonstration of this toolchain, we present an implementation of a H.264 encoder and evaluate its performance on Kalray's embedded manycore MPPA chip. ...
These I/O clusters are in charge of managing I/O data exchanges between either external buses (e.g. PCIe) or SDRAM. As other clusters they have a local processor for management and interface. ...
doi:10.1016/j.procs.2013.05.330
fatcat:4kw6fqbkubdhtnwptin2fai7aq
Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors
2015
Scientific Programming
We present a data distribution and locality-aware scheduling technique for task-based OpenMP programs executing on NUMA systems and manycore processors. ...
Performance degradation due to nonuniform data access latencies has worsened on NUMA systems and can now be felt on-chip in manycore processors. ...
Acknowledgments The work was partially funded by European FP7 project ENCORE Project Grant Agreement no. 248647 and Artemis PaPP Project no. 295440. ...
doi:10.1155/2015/981759
fatcat:r6gun4qwdfbezg4nnx63zmo7nm
Hierarchical Dataflow Model for efficient programming of clustered manycore processors
2017
2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)
Programming Multiprocessor Systems-on-Chips (MPSoCs) with hundreds of heterogeneous Processing Elements (PEs), complex memory architectures, and Networks-on-Chips (NoCs) remains a challenge for embedded ...
Dataflow Models of Computation (MoCs) are increasingly used for developing parallel applications as their high-level of abstraction eases the automation of mapping, task scheduling and memory allocation ...
The hierarchy feature of the IBSDF MoC is used for the mapping of computation on PEs, for code generation, and for efficient management of off-and on-chip communications. ...
doi:10.1109/asap.2017.7995270
dblp:conf/asap/HascoetDND17
fatcat:zrl6ffctonhrzixhzhh35nbzdy
Kokkos: Enabling Performance Portability Across Manycore Architectures
2013
2013 Extreme Scaling Workshop (xsw 2013)
Results presented in this paper are for pre-production Intel Xeon Phi co-processors (codenamed Knights Corner) and pre-production versions of Intel's Xeon Phi software stack. ...
The two foundational abstractions of Kokkos are (1) dispatch work to a manycore device for parallel execution and (2) manage multidimensional arrays with polymorphic layouts. ...
ACKNOWLEDGMENT Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. ...
doi:10.1109/xsw.2013.7
fatcat:p6t6h3yfxraetjz5wktlsn7uwu
Experience on optimizing irregular computation for memory hierarchy in manycore architecture
2008
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming - PPoPP '08
Percolation for creating locality just in time achieves sub-linear speedup on C64 with software-managed memory hierarchy. ...
Lock-based consistency for managing irregular producer-consumer coherence also obtains reasonable scalability on GodsonT with hardware-managed memory hierarchy. ...
doi:10.1145/1345206.1345255
dblp:conf/ppopp/TanFZRG08
fatcat:lhzkf4hz7zdgxl4nkdcndf56z4
Hosting an object heap on manycore hardware
2009
Proceedings of the 5th symposium on Dynamic languages - DLS '09
Our design relies on an object table, and the exploitation of a user-managed caching regime for readmostly objects. ...
Our system is far from complete, let alone optimal, but our experiences have helped us develop new intuitions needed to rise to the manycore software challenge. ...
for its passionate advancement and preservation of the original Smalltalk IDE; Leo Ungar for his editing; and Richard Schooler, VP SW Engineering at Tilera, and his team for their excellent support during ...
doi:10.1145/1640134.1640149
dblp:conf/dls/UngarA09
fatcat:jzt632nnfzh7jcv7urjibhk3by
HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA
[article]
2017
arXiv
pre-print
HERO includes a complete software stack that consists of a heterogeneous cross-compilation toolchain with support for OpenMP accelerator programming, a Linux driver, and runtime libraries for both host ...
Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host processor with programmable manycore accelerators (PMCAs) to combine general-purpose computing with domain-specific, efficient ...
PULP is an architectural template for scalable, energy-efficient processing that combines an explicitly-managed memory hierarchy, ISA extensions and compiler support for specialized DSP instructions, and ...
arXiv:1712.06497v1
fatcat:gwexa42crjb6nceyzieeflquyy
On the Performance and Isolation of Asymmetric Microkernel Design for Lightweight Manycores
2019
2019 IX Brazilian Symposium on Computing Systems Engineering (SBESC)
Also, our results unveil co-design aspects between an OS kernel and the architecture of lightweight manycore, concerning the memory system and core grouping. ...
While several multikernel OS designs are possible, in this work we argue on one that is structured in asymmetric microkernel instances. ...
a software-managed Memory Management Unit (MMU). ...
doi:10.1109/sbesc49506.2019.9046080
dblp:conf/sbesc/PennaSLCBFM19
fatcat:oe576cky2jfp3h4hynrszytmbq
« Previous
Showing results 1 — 15 out of 916 results