A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
CADS: Core-Aware Dynamic Scheduler for Multicore Memory Controllers
[article]
2019
arXiv
pre-print
CADS is also able to share the DRAM while guaranteeing fairness to all cores accessing memory. ...
However, current multicore processors are using traditional memory controllers, which are designed for single-core processors. ...
RELATED WORK Many memory scheduling policies have been proposed to improve the efficiency of memory accesses in the context of single-core processors. ...
arXiv:1907.07776v1
fatcat:vvrjpymdxvb7dcrdmzveu75hoe
Rainbow: A Composable Coherence Protocol for Multi-Chip Servers
[article]
2020
arXiv
pre-print
This paper introduces a new coherence protocol suitable, in terms of complexity and scalability, for this class of systems. ...
The coordinated work of both structures minimizes the coherence-related effects on the average memory latency perceived by the processor. ...
Figure 9 . 9 HTA-normalized average memory access latency for Rainbow in a 2-CMP system for 16k entries. ...
arXiv:2002.03944v1
fatcat:wfk2htl7efcutmmgg2kacqpp5e
Cost-effectively offering private buffers in SoCs and CMPs
2011
Proceedings of the international conference on Supercomputing - ICS '11
Much like shared caches improve SRAM utilization on CMPs, the BiC architecture generalizes this advantage for a heterogeneous mix of cores and accelerators in future SoCs and CMPs. ...
We demonstrate cost-effectiveness of the BiC using SoC-based low-power servers and CMP-based servers with on-chip NIC. ...
ACKNOWLEDGEMENTS We thank Sally McKee for her help to improve the quality of this paper. ...
doi:10.1145/1995896.1995940
dblp:conf/ics/FangZIFGLLKJM11
fatcat:u4amjcdfffgqljgcgzr4jidmk4
XChange: A market-based approach to scalable dynamic multi-resource allocation in multicore architectures
2015
2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)
Through XChange, the CMP functions as a market, where each shared resource is assigned a price which changes over time, and each core seeks to maximize its own utility, by bidding for these shared resources ...
Efficiently allocating shared on-chip resources across cores is critical to optimize execution in chip multiprocessors (CMPs). ...
Acknowledgments We are grateful to the anonymous reviewers for their thoughtful feedback, which helped improve the paper. ...
doi:10.1109/hpca.2015.7056026
dblp:conf/hpca/WangM15
fatcat:zfqpxh3vdvb6lcuuc4uhjnklx4
Scheduling for Better Energy Efficiency on Many-Core Chips
[chapter]
2017
Lecture Notes in Computer Science
Many-core chips are especially attractive for data center operators providing cloud computing service models. ...
In this paper, we demonstrate that many-core chips offer new opportunities for extremely light-weight migration of independent processes (or OSes) running bare-metal on the many-core chip. ...
Hosting providers, for example, providing access to bare-metal hosts can execute independent OSes on the different physical cores of a CMP. ...
doi:10.1007/978-3-319-61756-5_3
fatcat:uyuxuwyk55b2vba2e4ewommq64
Self-Optimizing Memory Controllers: A Reinforcement Learning Approach
2008
2008 International Symposium on Computer Architecture
Our results show that an RL-based memory controller improves the performance of a set of parallel applications run on a 4-core CMP by 19% on average (up to 33%), and it improves DRAM bandwidth utilization ...
Conventional memory controllers deliver relatively low performance in part because they often employ fixed, rigid access scheduling policies designed for average-case application behavior. ...
[25] describe policies that reorder accesses from different streams in stream-based computations. ...
doi:10.1109/isca.2008.21
dblp:conf/isca/IpekMMC08
fatcat:ct2pofqkrfaxlbtxvnp3corlam
Self-Optimizing Memory Controllers
2008
SIGARCH Computer Architecture News
Our results show that an RL-based memory controller improves the performance of a set of parallel applications run on a 4-core CMP by 19% on average (up to 33%), and it improves DRAM bandwidth utilization ...
Conventional memory controllers deliver relatively low performance in part because they often employ fixed, rigid access scheduling policies designed for average-case application behavior. ...
[25] describe policies that reorder accesses from different streams in stream-based computations. ...
doi:10.1145/1394608.1382172
fatcat:rqvs4uqidfdqhkjfzpbzpvtdqm
3D Stacked Cache Data Management for Energy Minimization of 3D Chip Multiprocessor
2015
International Journal of Students Research in Technology & Management
The suggested method considers both temperature distribution and memory traffic of 3-D CMPs. ...
In this model a runtime cache data mapping is discussed for 3-D stacked L2 caches to minimize the overall energy of 3-D chip multiprocessors (CMPs). ...
For example, the amount of turned on cache blocks (i.e., capacity) can be optimally determined and assigned to each core based on the memory access demands of applications and, then, the unassigned cache ...
doi:10.18510/ijsrtm.2015.325
fatcat:wxb36ypu2zfizmltie2xjqvxe4
Type-Directed Compilation for Multicore Programming
2009
Electronical Notes in Theoretical Computer Science
A Machine Model for CMP. ...
One basic method employs direct asynchronous data transfer, or Direct Memory Access (DMA), to an on-chip memory local to each core. ...
The example under consideration does not use shared access to main memory. ...
doi:10.1016/j.entcs.2009.06.006
fatcat:kffc32h7ajbozellytwabdyxde
REPAS: Reliable Execution for Parallel ApplicationS in Tiled-CMPs
[chapter]
2009
Lecture Notes in Computer Science
To address this issue, we present REPAS (Reliable execution of Parallel ApplicationS in tiled-CMPs), a novel RMT mechanism to provide reliable execution in shared-memory applications. ...
execution is made within 2-way SMT cores in which the majority of hardware is shared. ...
However, the mute core only accesses memory by means of non-coherent requests called phantom requests, providing redundant access to the memory system. ...
doi:10.1007/978-3-642-03869-3_32
fatcat:ijhk2vxqx5dubdyp4engtqkloq
High-Performance Energy-Efficient Multicore Embedded Computing
2012
IEEE Transactions on Parallel and Distributed Systems
With Moore's law supplying billions of transistors on-chip, embedded systems are undergoing a transition from single-core to multicore to exploit this high-transistor density for high performance. ...
Embedded systems differ from traditional high-performance supercomputers in that power is a first-order constraint for embedded systems; whereas, performance is the major benchmark for supercomputers. ...
Mobile agent (autonomous software agent)-based distributed embedded applications allow the process state to be saved and transported to another new embedded system where the process resumes execution from ...
doi:10.1109/tpds.2011.214
fatcat:vagqmojdsjevvc2u2ewqrcjjpq
MOPED: Orchestrating interprocess message data on CMPs
2011
2011 IEEE 17th International Symposium on High Performance Computer Architecture
Future CMPs will combine many simple cores with deep cache hierarchies. ...
Off-chip memory misses are reduced by 43-88% for applications and by 75-100% for microbenchmarks. ...
Heterogeneous cores offer one answer to this problem: sections of an existing code base can be rewritten for execution on a many-core accelerator such as a GPU, then wrapped with additional code to move ...
doi:10.1109/hpca.2011.5749721
dblp:conf/hpca/GuLKS11
fatcat:l2ca6frrmzgv7ockyr7l6zrzrq
CompROS: A composable ROS2 based architecture for real-time embedded robotic development
2021
Zenodo
Robot Operating System (ROS) is a de-facto standard robot middleware in many academic and industrial use cases. ...
In this paper, we address these limiting factors by proposing a hardware-software architecture -CompROS- for ROS2 based robotic development in a Multi-Processor System on Chip (MPSoC) platform that. ...
[20] proposed ROS-Lite as a ROS based framework for NoC-based embedded many-core platforms. ...
doi:10.5281/zenodo.5275416
fatcat:742zz2tgdjaxzpis3bxtlfmwey
VM3: Measuring, modeling and managing VM shared resources
2009
Computer Networks
In such environments, contention for shared platform resources (CPU cores, shared cache space, shared memory bandwidth, etc.) can have a significant effect on each virtual machine's performance. ...
Our measurement and modeling experiments are based on a consolidation benchmark (vConsolidate) running on a state-of-the-art CMP server. ...
memory bandwidth (but not for cores). ...
doi:10.1016/j.comnet.2009.04.015
fatcat:ya64n6l3nrgchpbcsybetj67lm
In large-scale CMPs, the communication efficiency among cores is crucial for the overall system performance and energy consumption. ...
For example, compared with MWSR design, SUOR achieves 2.58× throughput as well as saves 64% energy consumption on average in a 256-core CMP. ...
The throughputs and delay of three designs for 64-core CMP are shown in Figure 9 . ...
doi:10.1145/2600072
fatcat:tazivi6vyjd57fsqu7qa2yisim
« Previous
Showing results 1 — 15 out of 1,422 results