Filters








1,422 Hits in 4.8 sec

CADS: Core-Aware Dynamic Scheduler for Multicore Memory Controllers [article]

Eduardo Olmedo Sanchez, Xian-He Sun
2019 arXiv   pre-print
CADS is also able to share the DRAM while guaranteeing fairness to all cores accessing memory.  ...  However, current multicore processors are using traditional memory controllers, which are designed for single-core processors.  ...  RELATED WORK Many memory scheduling policies have been proposed to improve the efficiency of memory accesses in the context of single-core processors.  ... 
arXiv:1907.07776v1 fatcat:vvrjpymdxvb7dcrdmzveu75hoe

Rainbow: A Composable Coherence Protocol for Multi-Chip Servers [article]

Lucia G. Menezo, Valentin Puente, Jose A. Gregorio
2020 arXiv   pre-print
This paper introduces a new coherence protocol suitable, in terms of complexity and scalability, for this class of systems.  ...  The coordinated work of both structures minimizes the coherence-related effects on the average memory latency perceived by the processor.  ...  Figure 9 . 9 HTA-normalized average memory access latency for Rainbow in a 2-CMP system for 16k entries.  ... 
arXiv:2002.03944v1 fatcat:wfk2htl7efcutmmgg2kacqpp5e

Cost-effectively offering private buffers in SoCs and CMPs

Zhen Fang, Srihari Makineni, Li Zhao, Ravishankar R. Iyer, Carlos Flores Fajardo, German Fabila Garcia, Seung Eun Lee, Bin Li, Steve R. King, Xiaowei Jiang
2011 Proceedings of the international conference on Supercomputing - ICS '11  
Much like shared caches improve SRAM utilization on CMPs, the BiC architecture generalizes this advantage for a heterogeneous mix of cores and accelerators in future SoCs and CMPs.  ...  We demonstrate cost-effectiveness of the BiC using SoC-based low-power servers and CMP-based servers with on-chip NIC.  ...  ACKNOWLEDGEMENTS We thank Sally McKee for her help to improve the quality of this paper.  ... 
doi:10.1145/1995896.1995940 dblp:conf/ics/FangZIFGLLKJM11 fatcat:u4amjcdfffgqljgcgzr4jidmk4

XChange: A market-based approach to scalable dynamic multi-resource allocation in multicore architectures

Xiaodong Wang, Jose F. Martinez
2015 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)  
Through XChange, the CMP functions as a market, where each shared resource is assigned a price which changes over time, and each core seeks to maximize its own utility, by bidding for these shared resources  ...  Efficiently allocating shared on-chip resources across cores is critical to optimize execution in chip multiprocessors (CMPs).  ...  Acknowledgments We are grateful to the anonymous reviewers for their thoughtful feedback, which helped improve the paper.  ... 
doi:10.1109/hpca.2015.7056026 dblp:conf/hpca/WangM15 fatcat:zfqpxh3vdvb6lcuuc4uhjnklx4

Scheduling for Better Energy Efficiency on Many-Core Chips [chapter]

Chanseok Kang, Seungyul Lee, Yong-Jun Lee, Jaejin Lee, Bernhard Egger
2017 Lecture Notes in Computer Science  
Many-core chips are especially attractive for data center operators providing cloud computing service models.  ...  In this paper, we demonstrate that many-core chips offer new opportunities for extremely light-weight migration of independent processes (or OSes) running bare-metal on the many-core chip.  ...  Hosting providers, for example, providing access to bare-metal hosts can execute independent OSes on the different physical cores of a CMP.  ... 
doi:10.1007/978-3-319-61756-5_3 fatcat:uyuxuwyk55b2vba2e4ewommq64

Self-Optimizing Memory Controllers: A Reinforcement Learning Approach

Engin Ipek, Onur Mutlu, José F. Martínez, Rich Caruana
2008 2008 International Symposium on Computer Architecture  
Our results show that an RL-based memory controller improves the performance of a set of parallel applications run on a 4-core CMP by 19% on average (up to 33%), and it improves DRAM bandwidth utilization  ...  Conventional memory controllers deliver relatively low performance in part because they often employ fixed, rigid access scheduling policies designed for average-case application behavior.  ...  [25] describe policies that reorder accesses from different streams in stream-based computations.  ... 
doi:10.1109/isca.2008.21 dblp:conf/isca/IpekMMC08 fatcat:ct2pofqkrfaxlbtxvnp3corlam

Self-Optimizing Memory Controllers

Engin Ipek, Onur Mutlu, José F. Martínez, Rich Caruana
2008 SIGARCH Computer Architecture News  
Our results show that an RL-based memory controller improves the performance of a set of parallel applications run on a 4-core CMP by 19% on average (up to 33%), and it improves DRAM bandwidth utilization  ...  Conventional memory controllers deliver relatively low performance in part because they often employ fixed, rigid access scheduling policies designed for average-case application behavior.  ...  [25] describe policies that reorder accesses from different streams in stream-based computations.  ... 
doi:10.1145/1394608.1382172 fatcat:rqvs4uqidfdqhkjfzpbzpvtdqm

3D Stacked Cache Data Management for Energy Minimization of 3D Chip Multiprocessor

K. Suresh Kumar, S. Anitha, M. Gayathri
2015 International Journal of Students Research in Technology & Management  
The suggested method considers both temperature distribution and memory traffic of 3-D CMPs.  ...  In this model a runtime cache data mapping is discussed for 3-D stacked L2 caches to minimize the overall energy of 3-D chip multiprocessors (CMPs).  ...  For example, the amount of turned on cache blocks (i.e., capacity) can be optimally determined and assigned to each core based on the memory access demands of applications and, then, the unassigned cache  ... 
doi:10.18510/ijsrtm.2015.325 fatcat:wxb36ypu2zfizmltie2xjqvxe4

Type-Directed Compilation for Multicore Programming

Kohei Honda, Vasco T. Vasconcelos, Nobuko Yoshida
2009 Electronical Notes in Theoretical Computer Science  
A Machine Model for CMP.  ...  One basic method employs direct asynchronous data transfer, or Direct Memory Access (DMA), to an on-chip memory local to each core.  ...  The example under consideration does not use shared access to main memory.  ... 
doi:10.1016/j.entcs.2009.06.006 fatcat:kffc32h7ajbozellytwabdyxde

REPAS: Reliable Execution for Parallel ApplicationS in Tiled-CMPs [chapter]

Daniel Sánchez, Juan L. Aragón, José M. García
2009 Lecture Notes in Computer Science  
To address this issue, we present REPAS (Reliable execution of Parallel ApplicationS in tiled-CMPs), a novel RMT mechanism to provide reliable execution in shared-memory applications.  ...  execution is made within 2-way SMT cores in which the majority of hardware is shared.  ...  However, the mute core only accesses memory by means of non-coherent requests called phantom requests, providing redundant access to the memory system.  ... 
doi:10.1007/978-3-642-03869-3_32 fatcat:ijhk2vxqx5dubdyp4engtqkloq

High-Performance Energy-Efficient Multicore Embedded Computing

A. Munir, S. Ranka, A. Gordon-Ross
2012 IEEE Transactions on Parallel and Distributed Systems  
With Moore's law supplying billions of transistors on-chip, embedded systems are undergoing a transition from single-core to multicore to exploit this high-transistor density for high performance.  ...  Embedded systems differ from traditional high-performance supercomputers in that power is a first-order constraint for embedded systems; whereas, performance is the major benchmark for supercomputers.  ...  Mobile agent (autonomous software agent)-based distributed embedded applications allow the process state to be saved and transported to another new embedded system where the process resumes execution from  ... 
doi:10.1109/tpds.2011.214 fatcat:vagqmojdsjevvc2u2ewqrcjjpq

MOPED: Orchestrating interprocess message data on CMPs

Junli Gu, Steven S. Lumetta, Rakesh Kumar, Yihe Sun
2011 2011 IEEE 17th International Symposium on High Performance Computer Architecture  
Future CMPs will combine many simple cores with deep cache hierarchies.  ...  Off-chip memory misses are reduced by 43-88% for applications and by 75-100% for microbenchmarks.  ...  Heterogeneous cores offer one answer to this problem: sections of an existing code base can be rewritten for execution on a many-core accelerator such as a GPU, then wrapped with additional code to move  ... 
doi:10.1109/hpca.2011.5749721 dblp:conf/hpca/GuLKS11 fatcat:l2ca6frrmzgv7ockyr7l6zrzrq

CompROS: A composable ROS2 based architecture for real-time embedded robotic development

Saeid Dehnavi, Martijn Koedam, Andrew Nelson, Dip Goswami, Kees Goossens
2021 Zenodo  
Robot Operating System (ROS) is a de-facto standard robot middleware in many academic and industrial use cases.  ...  In this paper, we address these limiting factors by proposing a hardware-software architecture -CompROS- for ROS2 based robotic development in a Multi-Processor System on Chip (MPSoC) platform that.  ...  [20] proposed ROS-Lite as a ROS based framework for NoC-based embedded many-core platforms.  ... 
doi:10.5281/zenodo.5275416 fatcat:742zz2tgdjaxzpis3bxtlfmwey

VM3: Measuring, modeling and managing VM shared resources

Ravi Iyer, Ramesh Illikkal, Omesh Tickoo, Li Zhao, Padma Apparao, Don Newell
2009 Computer Networks  
In such environments, contention for shared platform resources (CPU cores, shared cache space, shared memory bandwidth, etc.) can have a significant effect on each virtual machine's performance.  ...  Our measurement and modeling experiments are based on a consolidation benchmark (vConsolidate) running on a state-of-the-art CMP server.  ...  memory bandwidth (but not for cores).  ... 
doi:10.1016/j.comnet.2009.04.015 fatcat:ya64n6l3nrgchpbcsybetj67lm

SUOR

Xiaowen Wu, Jiang Xu, Yaoyao Ye, Zhehui Wang, Mahdi Nikdast, Xuan Wang
2014 ACM Journal on Emerging Technologies in Computing Systems  
In large-scale CMPs, the communication efficiency among cores is crucial for the overall system performance and energy consumption.  ...  For example, compared with MWSR design, SUOR achieves 2.58× throughput as well as saves 64% energy consumption on average in a 256-core CMP.  ...  The throughputs and delay of three designs for 64-core CMP are shown in Figure 9 .  ... 
doi:10.1145/2600072 fatcat:tazivi6vyjd57fsqu7qa2yisim
« Previous Showing results 1 — 15 out of 1,422 results