Filters








21,600 Hits in 6.5 sec

Response Time Analysis of Synchronous Data Flow Programs on a Many-Core Processor

Hamza Rihani, Matthieu Moy, Claire Maiza, Robert I. Davis, Sebastian Altmeyer
2016 Proceedings of the 24th International Conference on Real-Time Networks and Systems - RTNS '16  
In this paper we introduce a response time analysis technique for Synchronous Data Flow programs mapped to multiple parallel dependent tasks running on a compute cluster of the Kalray MPPA-256 many-core  ...  The analysis we derive computes a set of response times and release dates that respect the constraints in the task dependency graph.  ...  EPSRC Research Data Management: No new primary data was created during this study.  ... 
doi:10.1145/2997465.2997472 dblp:conf/rtns/RihaniMMDA16 fatcat:s2wvwbsn5rgtrozcichaew6iqi

Integrated Worst-Case Execution Time Estimation of Multicore Applications

Dumitru Potop-Butucaru, Isabelle Puaut, Marc Herbstritt
2013 Worst-Case Execution Time Analysis  
Worst-case execution time (WCET) analysis has reached a high level of precision in the analysis of sequential programs executing on single-cores.  ...  In this paper we extend a state-of-the-art WCET analysis technique to compute tight WCETs estimates of parallel applications running on multicores.  ...  Control-flow analysis. This phase extracts information about possible execution paths from the program source or binary. The output of this phase is a data structure representing the possible flows.  ... 
doi:10.4230/oasics.wcet.2013.21 dblp:conf/wcet/Potop-ButucaruP13 fatcat:qkir7t55mfgexczbptxlpcn3fy

Reconciling performance and predictability on a many-core through off-line mapping

Thomas Carle, Manel Djemal, Daniela Genius, Francois Pecheux, Dumitru Potop Butucaru, Robert de Simone, Franck Wajsburt, Zhen Zhang
2014 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC)  
We start from a general-purpose many-core architecture designed for average-case performance and ease of use.  ...  In particular, its distributed shared memory programming model allows the use of a code generation flow based on the (unmodified) gcc compiler chain.  ...  Our approach relies on representing the application under the form of a dependent task system encoded as a data-flow program written in a synchronous language similar to Scade [39] .  ... 
doi:10.1109/recosoc.2014.6861367 dblp:conf/recosoc/CarleDGPPSWZ14 fatcat:mbxqa25um5hzviyd6ythe4id74

Calculation of worst-case execution time for multicore processors using deterministic execution

Hamid Mushtaq, Zaid Al-Ars, Koen Bertels
2015 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS)  
Worst-case execution time (WCET) analysis has reached a high level of precision in the analysis of sequential programs executing on single-cores.  ...  In this paper we extend a state-of-the-art WCET analysis technique to compute tight WCETs estimates of parallel applications running on multicores.  ...  Control-flow analysis. This phase extracts information about possible execution paths from the program source or binary. The output of this phase is a data structure representing the possible flows.  ... 
doi:10.1109/patmos.2015.7347584 dblp:conf/patmos/MushtaqAB15 fatcat:fl7tbwaeezbcngayfxmhj5m6ge

Models of Communication for Multicore Processors

Martin Schoeberl, Rasmus Bo Sorensen, Jens Sparso
2015 2015 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing Workshops  
To efficiently use multicore processors we need to ensure that almost all data communication stays on chip, i.e., the bits moved between tasks executing on different processor cores do not leave the chip  ...  Different forms of on-chip communication are supported by different hardware mechanism, e.g., shared caches with cache coherency protocols, core-tocore networks-on-chip, and shared scratchpad memories.  ...  Analysis on Code Level (TACLe).  ... 
doi:10.1109/isorcw.2015.57 dblp:conf/isorc/SchoeberlSS15 fatcat:2xfammjbbnb5vkdm6zfd5wxefy

An efficient scheduler of RTOS for multi/many-core system

Xiongli Gu, Peng Liu, Mei Yang, Jie Yang, Cheng Li, Qingdong Yao
2012 Computers & electrical engineering  
This paper presents a scheduler of master-slave real-time operating system (RTOS) to manage the thread running for the distributed multi/ many-core system without shared memories.  ...  a r t i c l e i n f o Article history: Available online xxxx a b s t r a c t Recently there is a trend to broaden the usage of lower-power embedded media processor core to build the future high-end computing  ...  This work is supported in part by NSFC under grants 60873112 and 61028004, and the National High Technology Research and Development Program of China under Grant 2009AA01Z109.  ... 
doi:10.1016/j.compeleceng.2011.09.009 fatcat:3cmfdlw5rzewpbrjdrwa7ndl7e

A Multi-thread Data Flow Solution Applying to Java Extension

Li Chen
2012 Physics Procedia  
Proposed a multi-thread basing on data flow and Java extensions to achieve solutions, presents a new multi-thread programming method.  ...  The multi-core processors environment is increasingly popular, the parallel studies of application program for this architecture has become the focus.  ...  On the other hand, different multi-core multi-threaded processors, the processing power of the hardware thread and the using scheme of the program towards cache is very different , will directly affect  ... 
doi:10.1016/j.phpro.2012.03.372 fatcat:3ssg57gbbbh4ldx22vpjdi6yr4

Hierarchical Dataflow Model for efficient programming of clustered manycore processors

Julien Hascoet, Karol Desnos, Jean-Francois Nezan, Benoit Dupont de Dinechin
2017 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)  
Deployment of an image processing application on a many-core MPSoC results in speedups of up to 58.7 compared to the sequential execution.  ...  Programming Multiprocessor Systems-on-Chips (MPSoCs) with hundreds of heterogeneous Processing Elements (PEs), complex memory architectures, and Networks-on-Chips (NoCs) remains a challenge for embedded  ...  Although we have a good scalability but we hit the memory bandwidth wall (Section IV-B2) of many-core processors when the 256 cores are competing for the global main memory.  ... 
doi:10.1109/asap.2017.7995270 dblp:conf/asap/HascoetDND17 fatcat:zrl6ffctonhrzixhzhh35nbzdy

Parallel code generation of synchronous programs for a many-core architecture

Amaury Graillat, Matthieu Moy, Pascal Raymond, Benoit Dupont de Dinechin
2018 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)  
Implementation of such programs on a many-core architecture must ensure a bounded response time and preserve the functional behavior by taking interference into account.  ...  Data-flow Synchronous languages such as Lustre or Scade are widely used for avionic critical software. Programs are described by networks of computational nodes.  ...  ACKNOWLEDGMENT Many thanks to Mustapha Lo (Airbus Helicopter/Verimag) who provided the idea and structure of the "sensor processing" case-study.  ... 
doi:10.23919/date.2018.8342182 dblp:conf/date/GraillatMRD18 fatcat:byutr534pzakpj6ixbvsyiakxi

Memory Architecture and Management in an NoC Platform [chapter]

Axel Jantsch, Xiaowen Chen, Abdul Naeem, Yuang Zhang, Sando Penolazzi, Zhonghai Lu
2011 Scalable Multi-core Architectures  
On-chip Computation is moving away from a sequential to a parallel paradigm leading to dozens, hundreds, and soon even thousands of cores and computational units on a single die.  ...  These many core chips can be highly homogeneous or irregular and heterogeneous, depending on the application area and market segment.  ...  We propose a programmable hardware block, called a Data Management Engine, that supports this architectural adaptation in multiple ways.  ... 
doi:10.1007/978-1-4419-6778-7_1 fatcat:ja4wt52fnzb43okhyabajri33a

A scalable thread scheduling co-processor based on data-flow principles

R. Giorgi, A. Scionti
2015 Future generations computer systems  
h i g h l i g h t s • We present a data-flow based co-processor supporting the execution of fine-grain threads. • We propose a minimalistic core ISA extension for data-flow threads. • We propose a two-level  ...  hierarchical scheduling co-processor that implements the ISA extension. • We show the scalability of the proposed system through a set of experimental results. a b s t r a c t Large synchronization and  ...  Synchronization among various threads running on separated cores is one of the barrier for scalability of homogeneous many-core chips [2, 3] .  ... 
doi:10.1016/j.future.2014.12.014 fatcat:ugefe3bre5h3xd2ktz5spbcvuy

HELIX-RC: An architecture-compiler co-design for automatic parallelization of irregular programs

Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy M. Jones, Gu-Yeon Wei, David Brooks
2014 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA)  
Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85× performance speedup for six SPEC CINT2000 benchmarks.  ...  Data dependences in sequential programs limit parallelization because extracted threads cannot run independently.  ...  Acknowledgements We thank the anonymous reviewers for their feedback on numerous manuscripts. Moreover, we would like to thank Glenn Holloway for his invaluable contributions to the HELIX project.  ... 
doi:10.1109/isca.2014.6853215 dblp:conf/isca/CampanoniBKJWB14 fatcat:uv7k7p2v4bf2pklh4fgnmkuvma

A Reactive and Cycle-True IP Emulator for MPSoC Exploration

S. Mahadevan, F. Angiolini, J. Sparso, L. Benini, J. Madsen
2008 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
Therefore, for the optimization of the MPSoC interconnect, the designer must develop traffic models that realistically capture the application behavior as executing on the IP core.  ...  The RIPE is built as a multithreaded abstract instruction-set processor, and it can generate reactive traffic patterns.  ...  of the slave response time.  ... 
doi:10.1109/tcad.2007.906990 fatcat:tpxuq26bsjd3deoc5zshmfxgdi

HELIX-RC

Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy M. Jones, Gu-Yeon Wei, David Brooks
2014 SIGARCH Computer Architecture News  
Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85× performance speedup for six SPEC CINT2000 benchmarks.  ...  Data dependences in sequential programs limit parallelization because extracted threads cannot run independently.  ...  Acknowledgements We thank the anonymous reviewers for their feedback on numerous manuscripts. Moreover, we would like to thank Glenn Holloway for his invaluable contributions to the HELIX project.  ... 
doi:10.1145/2678373.2665705 fatcat:g5va7ht7wndb7lec5ar3udp5g4

Automatically accelerating non-numerical programs by architecture-compiler co-design

Simone Campanoni, Kevin Brownell, Svilen Kanev, Timothy M. Jones, Gu-Yeon Wei, David Brooks
2017 Communications of the ACM  
Simulations of HELIX-RC, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85× performance speedup for six SPEC CINT2000 benchmarks.  ...  Because of the high cost of communication between processors, compilers that parallelize loops automatically have been forced to skip a large class of loops that are both critical to performance and rich  ...  INTRODUCTION On a multicore processor, the performance of a program depends largely on how well it exploits parallel threads.  ... 
doi:10.1145/3139461 fatcat:lwedxnvsxzatplcnwu4wbhkzsi
« Previous Showing results 1 — 15 out of 21,600 results