Filters








249 Hits in 5.8 sec

Cache equalizer

Mohammad Hammoud, Sangyeun Cho, Rami G. Melhem
2011 Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers - HiPEAC '11  
Temporal pressure at the on-chip last-level cache is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement process  ...  Simulation results using a full-system simulator demonstrate that CE achieves an average L2 miss rate reduction of 13.6% over a shared NUCA scheme and by as much as 46.7% for the benchmark programs we  ...  ., Tilera's Tile64 and Intel's Teraflops Research Chip) that co-locate distributed cores with distributed cache banks in tiles communicating via a network on-chip (NoC) [12] .  ... 
doi:10.1145/1944862.1944889 dblp:conf/hipeac/HammoudCM11 fatcat:gzndgemzqzabtn4jdec2dmq5hi

FELI: HW/SW Support for On-Chip Distributed Shared Memory in Multicores [chapter]

Carlos Villavieja, Yoav Etsion, Alex Ramirez, Nacho Navarro
2011 Lecture Notes in Computer Science  
It relies on a set of TLB counters, and dynamical migration of pages from off-chip memory to on-chip memory.  ...  FELI can automatically allocate on-chip memory to an average of 90% of the applications working set.  ...  Special thanks to the members of the Heterogeneous Architecture group at BSC and the anonymous reviewers for their comments and suggestions.  ... 
doi:10.1007/978-3-642-23400-2_27 fatcat:tr4recb4m5g67etayk7bj7gnpa

Open-Scale: A Scalable, Open-Source NOC-based MPSoC for Design Space Exploration

Remi Busseuil, Lyonel Barthe, Gabriel Marchesan Almeida, Luciano Ost, Florent Bruguier, Gilles Sassatelli, Pascal Benoit, Michel Robert, Lionel Torres
2011 2011 International Conference on Reconfigurable Computing and FPGAs  
The main objective of this platform is to provide a complete framework for research development on NoC-based distributed memory MPSoCs.  ...  As a consequence, one of the most promising embedded architecture consists in the replication of Processing Elements (PEs) connected through a Network-on-Chip (NoC).  ...  INTRODUCTION The increasing complexity of application and higher performance demand make Multiprocessors System-on-Chip (MPSoCs) one valuable alternative for dealing with nowadays embedded requirements  ... 
doi:10.1109/reconfig.2011.66 dblp:conf/reconfig/BusseuilBAOBSBRT11 fatcat:qcj7mvj43rhwniavjnlbcxrfye

Managing QoS flows at task level in NoC-based MPSoCs

Everton Carara, Ney Calazans, Fernando Moraes
2009 2009 17th IFIP International Conference on Very Large Scale Integration (VLSI-SoC)  
This work bridges the hardware/software gap, exploring the integration of low-level NoC services into an application programming interface (API).  ...  An important issue in MPSoC design is QoS, since applications running in such systems may have tight timing constraints, as video processing or fast communication protocols.  ...  INTRODUCTION Multiprocessor systems-on-chips (MPSoCs) provide a huge design space exploration for applications with high computational demands.  ... 
doi:10.1109/vlsisoc.2009.6041343 fatcat:axvfwvk2infydaw2hgheoniydm

A Dynamic Pressure-Aware Associative Placement Strategy for Large Scale Chip Multiprocessors

Mohammad Hammoud, Sangyeun Cho, Rami Melhem
2010 IEEE computer architecture letters  
Temporal pressure at the on-chip last-level cache, is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement  ...  Simulation results using a full-system simulator demonstrate that CE outperforms shared NUCA caches by an average of 15.5% and by as much as 28.5% for the benchmark programs we examined.  ...  ., Tilera's Tile64 and Intel's Teraflops Research Chip) that co-locate distributed cores with distributed cache banks in tiles communicating via a network on-chip (NoC) [11] .  ... 
doi:10.1109/l-ca.2010.7 fatcat:5obf374lfnbnzhyuy2qr2r4r2i

Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Woo-Cheol Kwon, Tushar Krishna, Li-Shiuan Peh
2014 Proceedings of the 19th international conference on Architectural support for programming languages and operating systems - ASPLOS '14  
However, this complicates the problem of data tracking and search/invalidation; tracking the state of a line at all on-chip caches at a directory or performing full-chip broadcasts are both non-scalable  ...  In this paper, we make the case for Locality-Oblivious Cache Organization (LOCO), a CMP cache organization that leverages the on-chip network to create virtual single-cycle paths between distant caches  ...  network (STARnet), under the Center for Future Architectures (C-FAR) research center.  ... 
doi:10.1145/2541940.2541976 dblp:conf/asplos/KwonKP14 fatcat:fefauexc45hghcbkglxtokx4rq

C-AMTE: A location mechanism for flexible cache management in chip multiprocessors

Mohammad Hammoud, Sangyeun Cho, Rami Melhem
2011 Journal of Parallel and Distributed Computing  
This paper describes Constrained Associative-Mapping-of-Tracking-Entries (C-AMTE), a scalable mechanism to facilitate flexible and efficient distributed cache management in large-scale chip multiprocessors  ...  C-AMTE enables fast locating of cache blocks in CMP cache schemes that employ one-to-one or one-to-many associative mappings.  ...  ., Tilera's Tile64 and Intel's Teraflops Research Chip) that co-locate distributed cores with distributed cache banks in tiles communicating via a network on-chip (NoC) [13] .  ... 
doi:10.1016/j.jpdc.2010.11.009 fatcat:qhcpugsfwrhkhnwxy5zu43vwi4

Feedback-Driven Restructuring of Multi-threaded Applications for NUCA Cache Performance in CMPs

Sandro Bartolini, Pierfrancesco Foglia, Marco Solinas, Cosimo Antonio Prete
2010 2010 22nd International Symposium on Computer Architecture and High Performance Computing  
We show techniques for altering the distribution of applications into the cache space as to achieve improved average memory access time.  ...  We consider a number of Splash-2 and Parsec benchmarks on an 8 processor system and we show that a relatively simple remapping algorithm is able to improve the average Static-NUCA (SNUCA) cache access  ...  ACKNOWLEDGMENT The authors would like to thank the colleague Manuel Comparetti for the insightful discussions on NUCA caches, and for the tests performed on the simulator 1 .  ... 
doi:10.1109/sbac-pad.2010.20 dblp:conf/sbac-pad/BartoliniFSP10 fatcat:vuxo5p555zacxmqx2kru3zbsye

Physical-aware system-level design for tiled hierarchical chip multiprocessors

Jordi Cortadella, Javier de San Pedro, Nikita Nikitin, Jordi Petit
2013 Proceedings of the 2013 ACM international symposium on International symposium on physical design - ISPD '13  
In this work, the importance of physical-aware system-level exploration is investigated, and a strategy for deriving chip floorplans is described.  ...  The combination of architectural exploration and physical planning is studied with an example and the impact of the physical aspects on the selection of architectural parameters is evaluated.  ...  The given formulation is an example of the architectural exploration problem with the objective of efficiently distributing the chip resources among the components of a multi-core system, e.g. cores, memories  ... 
doi:10.1145/2451916.2451920 dblp:conf/ispd/CortadellaPNP13 fatcat:ufzyr2nvlzg7zd4og7af4oe46u

Exploiting multicast messages in cache-coherence protocols for NoC-based MPSoCs

Tales M. Chaves, Everton A. Carara, Fernando G. Moraes
2011 6th International Workshop on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC)  
The shift in the communication infrastructure, from buses to networks-on-chip (NoCs), adds new design challenges.  ...  The main functionality NoCs may provide for the protocols is the way messages are sent through the network. Most NoCs support multicast as a set of unicast messages.  ...  ACKNOWLEDGMENTS The Authors acknowledge the support of CNPq, projects 301599/2009-2 and 133526/2010-0, and FAPERGS project 10/0814-9.  ... 
doi:10.1109/recosoc.2011.5981492 dblp:conf/recosoc/ChavesCM11 fatcat:zrxbt3upi5djri3apzas5p4m2y

Manycore network interfaces for in-memory rack-scale computing

Alexandros Daglis, Stanko Novaković, Edouard Bugnion, Babak Falsafi, Boris Grot
2015 Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15  
Our best manycore NI architecture achieves latencies within 3% of an idealized hardware NUMA and efficiently uses the full bisection bandwidth of the NOC, without changing the on-chip coherence protocol  ...  Our results indicate that a careful splitting of NI functionality per chip tile and at the chip's edge along a NOC dimension enables a rack-scale architecture to optimize for both latency and bandwidth  ...  Mirzadeh and the rest of the PARSA group for their feedback and support.  ... 
doi:10.1145/2749469.2750415 dblp:conf/isca/DaglisNBFG15 fatcat:sh6qqz6rkvdc3eb3emwovgpk7y

Manycore network interfaces for in-memory rack-scale computing

Alexandros Daglis, Stanko Novaković, Edouard Bugnion, Babak Falsafi, Boris Grot
2015 SIGARCH Computer Architecture News  
Our best manycore NI architecture achieves latencies within 3% of an idealized hardware NUMA and efficiently uses the full bisection bandwidth of the NOC, without changing the on-chip coherence protocol  ...  Our results indicate that a careful splitting of NI functionality per chip tile and at the chip's edge along a NOC dimension enables a rack-scale architecture to optimize for both latency and bandwidth  ...  Mirzadeh and the rest of the PARSA group for their feedback and support.  ... 
doi:10.1145/2872887.2750415 fatcat:hflmsptnsjhfdn6qoicsvqwude

SpiNNaker: A multi-core System-on-Chip for massively-parallel neural net simulation

Eustace Painkras, Luis A. Plana, Jim Garside, Steve Temple, Simon Davidson, Jeffrey Pepper, David Clark, Cameron Patterson, Steve Furber
2012 Proceedings of the IEEE 2012 Custom Integrated Circuits Conference  
The basic block of the machine is the SpiNNaker multicore System-on-Chip, a Globally Asynchronous Locally Synchronous (GALS) system with 18 ARM968 processor nodes residing in synchronous islands, surrounded  ...  The modelling of large systems of spiking neurons is computationally very demanding in terms of processing power and communication.  ...  The die photo in Fig. 4(b) is courtesy of Unisem Europe Ltd.  ... 
doi:10.1109/cicc.2012.6330636 dblp:conf/cicc/PainkrasPGTDPCPF12 fatcat:cm5i4u3wa5ghffa52nxeqrynwa

Runtime Detection of a Bandwidth Denial Attack from a Rogue Network-on-Chip

Rajesh JS, Dean Michael Ancajas, Koushik Chakraborty, Sanghamitra Roy
2015 Proceedings of the 9th International Symposium on Networks-on-Chip - NOCS '15  
This work explores a covert threat model for multi-processor system on chips designed using 3rd party NoCs.  ...  NoC is an interconnect network for the glueless integration of on-chip components in the modern complex communication centric designs.  ...  For example, an application with a heavy memory footprint has a strong dependence on the on-chip memory controllers, as well as, on SoC nodes with cache slices housing pertinent data.  ... 
doi:10.1145/2786572.2786580 dblp:conf/nocs/SACR15 fatcat:xnd254j74zdwbnfq62osgmz4cm

Dynamic thread and data mapping for NoC based CMPs

Mahmut Kandemir, Ozcan Ozturk, Sai P. Muralidhara
2009 Proceedings of the 46th Annual Design Automation Conference on ZZZ - DAC '09  
Thread mapping and data mapping are two important problems in the context of NoC (network-on-chip) based CMPs (chip multiprocessors).  ...  In this work, we present dynamic (runtime) thread and data mappings for NoC based CMPs.  ...  For this purpose, we first present an application-specific, dynamic thread assignment strategy for NoC based CMP systems.  ... 
doi:10.1145/1629911.1630129 dblp:conf/dac/KandemirOM09 fatcat:xti3qbimcrbvngdakbdar5nco4
« Previous Showing results 1 — 15 out of 249 results