48,901 Hits in 4.5 sec

A Distributed Framework for Low-Latency OpenVX over the RDMA NoC of a Clustered Manycore

Julien Hascoe, Benoet Dupont de Dinechin, Karol Desnos, Jean-Francois Nezan
2018 2018 IEEE High Performance extreme Computing Conference (HPEC)  
OpenVX abstracts the target processor architecture complexity and automates the implementation of processing pipelines through high-level optimizations.  ...  Experimental results show that super-linear speed-ups are obtained for multi-cluster execution by leveraging the bandwidth of on-chip memories and the capabilities of asynchronous RDMA engines.  ...  INTRODUCTION Server and desktop systems are built from multi-core processors that integrate up to a few tens of highly complex Central Processing Units (CPUs) cores.  ... 
doi:10.1109/hpec.2018.8547736 dblp:conf/hpec/HascoetDDN18 fatcat:ri7iejxc2ffmhk3sntlilwxniq

Towards automatic actor pinning on multi-core architectures

Emilio Francesquini, Alfredo Goldman, Jean-François Méhaut
2012 Proceedings of the eleventh ACM SIGPLAN workshop on Erlang workshop - Erlang '12  
Using application profiling in association with hardware counters, it will pin and migrate actors to processing units aiming to optimize performance.  ...  In this model the execution relies on a runtime environment (RE) to be able to efficiently use the underlying machine.  ...  Acknowledgments This research has been partially funded by Hewlett-Packard Brazil under Project Baile and by CAPES under project CAPES/Cofecub.  ... 
doi:10.1145/2364489.2364501 dblp:conf/erlang/FrancesquiniGM12 fatcat:wlttw3rewrdvvgr3hzgphhb7r4

Parallelization Strategies for Hybrid Metaheuristics Using a Single GPU and Multi-core Resources [chapter]

Thé Van Luong, Eric Taillard, Nouredine Melab, El-Ghazali Talbi
2012 Lecture Notes in Computer Science  
As a result, the use of GPU computing has been recognized as a major way to speed up the search process.  ...  However, most GPU-accelerated algorithms of the literature do not take benefits of all the available CPU cores.  ...  independent S-metaheuristics on multi-core architectures.  ... 
doi:10.1007/978-3-642-32964-7_37 fatcat:ujkmsf2rznennktei5rdavssbi

OSCAR API for Real-Time Low-Power Multicores and Its Performance on Multicores and SMP Servers [chapter]

Keiji Kimura, Masayoshi Mase, Hiroki Mikami, Takamichi Miyamoto, Jun Shirako, Hironori Kasahara
2010 Lecture Notes in Computer Science  
with eight cores and 2.9 times speedup on RP2 with four cores, respectively.  ...  Due to low-power optimization on RP2, the OSCAR compiler with the OSCAR API achieves a maximum power reduction of 84% in the real-time execution mode.  ...  Specifications of OSCAR API are discussed at the Realtime Consumer Electronics Multicore Architecture and API Committee in these projects.  ... 
doi:10.1007/978-3-642-13374-9_13 fatcat:n75vuldrrzcfnpq76er5pwv6dq

AVSynDEx: A Rapid Prototyping Process Dedicated to the Implementation of Digital Image Processing Applications on Multi-DSP and FPGA Architectures

Virginie Fresse, Olivier Déforges, Jean-François Nezan
2002 EURASIP Journal on Advances in Signal Processing  
We present AVSynDEx (concatenation of AVS + SynDEx), a rapid prototyping process aiming to the implementation of digital signal processing applications on mixed architectures (multi-DSP + FPGA).  ...  These tools and architectures are judiciously selected and integrated during the implementation process to help a signal processing specialist without relevant hardware experience.  ...  The data flow graph is implemented on a multi-DSP + FPGA board. a distributed and optimized executive. A third tool, a translator between AVS and SynDEx realized the automatic link.  ... 
doi:10.1155/s1110865702205016 fatcat:5wbgcbwkdnbojbfbyrnzse6eey

A multigrain Delaunay mesh generation method for multicore SMT-based architectures

Christos D. Antonopoulos, Filip Blagojevic, Andrey N. Chernikov, Nikos P. Chrisochoides, Dimitrios S. Nikolopoulos
2009 Journal of Parallel and Distributed Computing  
Given the proliferation of layered, multicore-and SMT-based architectures, it is imperative to deploy and evaluate important, multi-level, scientific computing codes, such as meshing algorithms, on these  ...  The exploitation of the coarser degree of granularity facilitates scalability both in terms of execution time and problem size on loosely-coupled clusters.  ...  Acknowledgments This work was supported in part by the following NSF grants: EIA-9972853, ACI-0085963, EIA-0203974, ACI-0312980, Career award CCF-0346867, CNS-0521381 and DOE grant DE-FG02-05ER2568.  ... 
doi:10.1016/j.jpdc.2009.03.009 fatcat:ytds5g2b6jgn3m5mrvxnhgyivi

Comparison of Hybrid Sorting Algorithms Implemented on Different Parallel Hardware Platforms

Zurek Dominik, Pietron Marcin, Wielgosz Maciej, Wiatr Kazimierz
2013 Computer Science  
Recently, many-core and multi-core platforms have enabled the creation of wide parallel algorithms. We have standard processors that consist of multiple cores and hardware accelerators, like the GPU.  ...  In most cases, these describe the resulting time of sorting algorithm executions on the GPU platform and a single CPU core.  ...  The single cores in multi-core systems may implement architectures such as vector processing, SIMD, or multi-threading.  ... 
doi:10.7494/csci.2013.14.4.679 fatcat:fetcipfq25g3ndi6cump5rxw6u

The Multi-Core Era - Trends and Challenges [article]

Peter Tröger
2008 arXiv   pre-print
It discusses how 40 years of parallel computing research need to be considered in the upcoming multi-core era.  ...  The following article gives a summary and bibliography for recent trends and challenges in CMP architectures.  ...  A set of execution units can be put together to form a chip multi-processing (CMP) architecture [35] .  ... 
arXiv:0810.5439v1 fatcat:p67elbfj2vf6xhonekhmkelpva

Implementation of GP-GPU with SIMT Architecture in the Embedded Environment

Kwang-yeob Lee, Jae-chang Kwak
2014 International Journal of Multimedia and Ubiquitous Engineering  
Since general purpose CPU has small number of core, which is optimized for serial processing, it has a limitation of parallel processing.  ...  For a single SP, an odd warp and an even warp are assigned and processed.  ...  International Journal of Multimedia and Ubiquitous Engineering Vol. 9, No.4 (2014)  ... 
doi:10.14257/ijmue.2014.9.4.23 fatcat:rvomjxxksrbttc4ahzzdvhepoe

A Multicore-Aware Runtime Architecture for Scalable Service Composition

Daniele Bonetta, Achille Peternier, Cesare Pautasso, Walter Binder
2010 2010 IEEE Asia-Pacific Services Computing Conference  
In this paper we present an innovative process execution engine architecture.  ...  Its design takes into account the specific constraints of multicore machines and scales well on different processor architectures, as shown by our extensive performance evaluation.  ...  CRSI22 127386), and by the European Community under the grant agreement no. EU-FP7-215483-S-Cube.  ... 
doi:10.1109/apscc.2010.61 dblp:conf/apscc/BonettaPPB10 fatcat:b7cv3fatcvc5jg55hj2wtaas3a

TaskGenX: A Hardware-Software Proposal for Accelerating Task Parallelism [chapter]

Kallia Chronaki, Marc Casas, Miquel Moreto, Jaume Bosch, Rosa M. Badia
2018 Lecture Notes in Computer Science  
TaskGenX offers a solution for minimizing task creation overheads and relies both on the runtime system and a dedicated hardware.  ...  However, the increasing number of cores on modern CMPs is pushing research towards the use of fine grained parallelism.  ...  RTopt can optimize its architecture, having a different cache hierarchy, pipeline configuration and specialized hardware structures to hold and process the SRT.  ... 
doi:10.1007/978-3-319-92040-5_20 fatcat:6egp6ckmfrd6vjwksfgdtgyqmu

Editorial: enabling technologies for programming extreme scale systems

Ching-Hsien Hsu
2012 Journal of Supercomputing  
Multi-core architecture presents a new trend, and core-based parallel processing algorithms will continue to become more important.  ...  programming, GPU implementation and optimization, execution on real-world parallel architecture and novel applications associated with this new paradigm.  ...  The run-time system itself is language independent and manages actor allocation, stream graph creation and stream program execution on multi-core architectures.  ... 
doi:10.1007/s11227-012-0745-2 fatcat:w74ggqhcibc27llbaq4nqe3oiq

Towards Scalable Service Composition on Multicores [chapter]

Daniele Bonetta, Achille Peternier, Cesare Pautasso, Walter Binder
2010 Lecture Notes in Computer Science  
Recent chip multi-processors combine several cores with a hierarchy of caches on a single processor.  ...  The advent of modern multicore machines, comprising several chip multi-processors each offering multiple cores and often featuring a large shared cache, offers the opportunity to redesign the architecture  ...  Acknowledgment We gratefully acknowledge the financial support of the Swiss National Science Foundation for the project "SOSOA: Self-Organizing Service-Oriented Architectures" (SNF Sinergia Project No.  ... 
doi:10.1007/978-3-642-16961-8_90 fatcat:pgvblwr62vadjcpof72tb2rliy

High-Performance Embedded Architecture and Compilation Roadmap [chapter]

Koen De Bosschere, Wayne Luk, Xavier Martorell, Nacho Navarro, Mike O'Boyle, Dionisios Pnevmatikatos, Alex Ramirez, Pascal Sainrat, André Seznec, Per Stenström, Olivier Temam
2007 Lecture Notes in Computer Science  
The HiPEAC roadmap is organized around 10 central themes: (i) single core architecture, (ii) multi-core architecture, (iii) interconnection networks, (iv) programming models and tools, (v) compilation,  ...  One of the key deliverables of the EU HiPEAC FP6 Network of Excellence is a roadmap on high-performance embedded architecture and compilation -the HiPEAC Roadmap for short.  ...  Challenge 6.1: Execution Environments for Heterogeneous Systems Runtimes and operating systems have to be aware that the architecture is a heterogeneous multi-core.  ... 
doi:10.1007/978-3-540-71528-3_2 fatcat:ywmebvj7wrfb3ojghsjs4w3fy4

Design and Simulation of High Performance Parallel Architectures Using the ISAC Language

Zdeněk Přikryl, Jakub Křoustek, Tomáš Hruška, Dušan Kolář, Karel Masařík, Adam Husár
2011 GSTF International Journal on Computing  
Design and testing of these complex systems is time-consuming and iterative process. Architecture description languages (ADLs) are one of the most effective solutions for single processor design.  ...  However, support for description of parallel architectures and multi-processor systems is very low or completely missing in nowadays ADLs.  ...  Multi-core Processors and Multi-processor System on a Chip A multi-core processor and can be described using the ISAC language as well. Each processor core is described in a separate model.  ... 
doi:10.5176/2010-2283_1.2.46 fatcat:ilspev4g4fhsnommrdef2qgiqe
« Previous Showing results 1 — 15 out of 48,901 results