Filters








189 Hits in 6.7 sec

An OS-based alternative to full hardware coherence on tiled CMPs

Christian Fensch, Marcelo Cintra
2008 High-Performance Computer Architecture  
The proposed mechanism is based on the key ideas that mapping of lines to physical caches is done at the page level with OS support and that hardware supports remote cache accesses.  ...  We evaluate the proposed tiled CMP architecture on the Splash-2 scientific benchmarks and ALPBench multimedia benchmarks against one with private caches and a distributed directory cache coherence mechanism  ...  An alternative to enforce coherence in a distributed memory system is to use the OS' virtual memory (VM) system to handle the copies of virtual pages, as was done on software DSM systems (e.g., [5, 15  ... 
doi:10.1109/hpca.2008.4658652 dblp:conf/hpca/FenschC08 fatcat:25r22s5q4jdehgepsizvllu2qi

Virtual hierarchies to support server consolidation

Michael R. Marty, Mark D. Hill
2007 SIGARCH Computer Architecture News  
on our simulated 64-core CMP.  ...  Second, we develop the paper's central idea of imposing a two-level virtual (or logical) coherence hierarchy on a physically flat CMP that harmonizes with VM assignment.  ...  An alternative approach minimizes hardware change by relying on the OS or hypervisor to manage the cache hierarchy through page allocation.  ... 
doi:10.1145/1273440.1250670 fatcat:unky5v7mjrgjrkqo6onif4olqm

Virtual hierarchies to support server consolidation

Michael R. Marty, Mark D. Hill
2007 Proceedings of the 34th annual international symposium on Computer architecture - ISCA '07  
on our simulated 64-core CMP.  ...  Second, we develop the paper's central idea of imposing a two-level virtual (or logical) coherence hierarchy on a physically flat CMP that harmonizes with VM assignment.  ...  An alternative approach minimizes hardware change by relying on the OS or hypervisor to manage the cache hierarchy through page allocation.  ... 
doi:10.1145/1250662.1250670 dblp:conf/isca/MartyH07 fatcat:vahlmpg2orblnlks4s5xmruj3u

Reactive NUCA

Nikos Hardavellas, Michael Ferdman, Babak Falsafi, Anastasia Ailamaki
2009 Proceedings of the 36th annual international symposium on Computer architecture - ISCA '09  
R-NUCA cooperates with the operating system to support intelligent placement, migration, and replication without the overhead of an explicit coherence mechanism for the on-chip last-level cache.  ...  Based on this observation, we propose Reactive NUCA (R-NUCA), a distributed cache design which reacts to the class of each cache access and places blocks at the appropriate location in the cache.  ...  Strigkos, and the anonymous reviewers for their feedback on earlier drafts of this paper.  ... 
doi:10.1145/1555754.1555779 dblp:conf/isca/HardavellasFFA09 fatcat:326qapu44fd47o5dt3qm7ghbgy

C-AMTE: A location mechanism for flexible cache management in chip multiprocessors

Mohammad Hammoud, Sangyeun Cho, Rami Melhem
2011 Journal of Parallel and Distributed Computing  
C-AMTE enables fast locating of cache blocks in CMP cache schemes that employ one-to-one or one-to-many associative mappings.  ...  (CMPs).  ...  tracking entries coherent (e.g., 16 bits for a 16-tile CMP model), and (3) an ID that points to the tile that is currently hosting B (e.g., 4 bits for a 16-tile CMP model).  ... 
doi:10.1016/j.jpdc.2010.11.009 fatcat:qhcpugsfwrhkhnwxy5zu43vwi4

A chip prototyping substrate

John D. Davis, Stephen E. Richardson, Charis Charitsis, Kunle Olukotun
2005 SIGARCH Computer Architecture News  
FAST combines configurable and fixedfunction hardware and software to facilitate rapid prototyping by utilizing components optimized for their particular tasks: FPGAs for interconnect and glue logic; processors  ...  We illustrate FAST's utility by describing mappings of both a smallscale CMP with speculation support and a large-scale CMP connected using a network.  ...  Acknowledgements We would like to thank Alan Swithenbank and Wade Gupta for their contribution to the FAST PCB layout and fabrication.  ... 
doi:10.1145/1105734.1105740 fatcat:jsvwufawqfgpfpyvuhsxr7qubi

Virtual Hierarchies

Michael R. Marty, Mark D. Hill
2008 IEEE Micro  
(Alternatively, a dynamic home tile could be selected on a per-page basis, either by explicitly encoding a home tile location into the page table or using automatic hardware-based predictors.  ...  To do so, we implemented a policy that uses a simple heuristic based on the block's coherence state.  ... 
doi:10.1109/mm.2008.19 fatcat:rnaagwiwn5hkfpczt6q2r6fn7i

Hardware-modulated parallelism in chip multiprocessors

Julia Chen, Philo Juang, Kevin Ko, Gilberto Contreras, David Penry, Ram Rangan, Adam Stoler, Li-Shiuan Peh, Margaret Martonosi
2005 SIGARCH Computer Architecture News  
The hardware, meanwhile, is designed to offer low-overhead, low-area support for orchestrating and modulating this parallelism on CMPs at runtime.  ...  This paper presents and evaluates a new approach to highlyparallel CMPs, advocating a new hardware-software contract.  ...  Acknowledgments We thank David August and his group at Princeton for their support of our use of the Liberty Simulation Environment (LSE) and for extensive discussions on the NDP architecture.  ... 
doi:10.1145/1105734.1105742 fatcat:d5iferqj5fghrmndyw6b4ill6y

A Dynamic Pressure-Aware Associative Placement Strategy for Large Scale Chip Multiprocessors

Mohammad Hammoud, Sangyeun Cho, Rami Melhem
2010 IEEE computer architecture letters  
Temporal pressure at the on-chip last-level cache, is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement  ...  Simulation results using a full-system simulator demonstrate that CE outperforms shared NUCA caches by an average of 15.5% and by as much as 28.5% for the benchmark programs we examined.  ...  Cho and Jin [7] proposed an OS-based page allocation algorithm applicable to NUCA architectures. Cache blocks are mapped to the L2 cache space using a simple interleaving on page frame numbers.  ... 
doi:10.1109/l-ca.2010.7 fatcat:5obf374lfnbnzhyuy2qr2r4r2i

Cache equalizer

Mohammad Hammoud, Sangyeun Cho, Rami G. Melhem
2011 Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers - HiPEAC '11  
Temporal pressure at the on-chip last-level cache is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement process  ...  Simulation results using a full-system simulator demonstrate that CE achieves an average L2 miss rate reduction of 13.6% over a shared NUCA scheme and by as much as 46.7% for the benchmark programs we  ...  The generated colored pages can be used by the OS to guide their allocation of physical pages. Cho and Jin [8] proposed an OS-based page allocation algorithm applicable to NUCA architectures.  ... 
doi:10.1145/1944862.1944889 dblp:conf/hipeac/HammoudCM11 fatcat:gzndgemzqzabtn4jdec2dmq5hi

Dynamic cache clustering for chip multiprocessors

Mohammad Hammoud, Sangyeun Cho, Rami Melhem
2009 Proceedings of the 23rd international conference on Conference on Supercomputing - ICS '09  
Simulation results using a full-system simulator demonstrate that DCC outperforms alternative L2 cache designs.  ...  The basic trade-offs of varying the on-chip cache clusters are average L2 access latency and L2 miss rate.  ...  [5] proposed CMP-NuRAPID based on the private design, and tries to control replication based on usage patterns.  ... 
doi:10.1145/1542275.1542289 dblp:conf/ics/HammoudCM09 fatcat:gnrlmmldjbgrfcsp7hjlvrn5aq

Design tradeoffs for simplicity and efficient verification in the Execution Migration Machine

Keun Sup Shim, Mieszko Lis, Myong Hyon Cho, Ilia Lebedev, Srinivas Devadas
2013 2013 IEEE 31st International Conference on Computer Design (ICCD)  
Moreover, Moore's law has led to a ubiquitous trend of an increasing number of cores on a single chip.  ...  or coherence protocols), such remote-access-based directoryless architectures cannot take advantage of any data locality, and therefore suffer in both performance and energy.  ...  Several projects have proposed to transfer the burden of cache coherence from hardware to the OS and software [9] , [10] , or moved the coherence handling to the OS while preserving hardware support  ... 
doi:10.1109/iccd.2013.6657037 dblp:conf/iccd/ShimLCLD13 fatcat:mgif6qbbn5cmzmpy2ct65jdwte

A scalable organization for distributed directories

Alberto Ros, Manuel E. Acacio, José M. García
2010 Journal of systems architecture  
In this work, we present a distributed directory organization based on duplicate tags for tiled CMP architectures whose size is independent on the number of tiles of the system up to a certain number of  ...  Although directory-based cache-coherence protocols are the best choice when designing chip multiprocessors with tens of cores on-chip, the memory overhead introduced by the directory structure may not  ...  Up to now, most tiled CMP proposals assume a straightforward implementation for the directory structure based on the use of a full-map sharing code.  ... 
doi:10.1016/j.sysarc.2009.11.006 fatcat:g72iqz2yt5bkbdaxt22hsu7bkq

Extending SRT for parallel applications in tiled-CMP architectures

D. Sanchez, J.L. Aragon, J.M. Garcia
2009 2009 IEEE International Symposium on Parallel & Distributed Processing  
However, mechanisms to achieve full coverage to errors usually degrade performance in an unacceptable way for the majority of common users.  ...  We propose an alternative mechanism in which the L1 cache is updated by master's stores before verification reducing the overhead up to 21%.  ...  Our study has been focused on a tiled CMP where each core is a 2-threaded SMT which has its own private L1 cache, a portion of the shared L2 cache and a connection to the on-chip network.  ... 
doi:10.1109/ipdps.2009.5160902 dblp:conf/ipps/SanchezAG09 fatcat:p5rknuu5ardtjb5h4lcsdmk2ne

Flexible architectural support for fine-grain scheduling

Daniel Sanchez, Richard M. Yoo, Christos Kozyrakis
2010 SIGARCH Computer Architecture News  
This paper presents a combined hardware-software approach to build fine-grain schedulers that retain the flexibility of software schedulers while being as fast and scalable as hardware ones.  ...  To make efficient use of CMPs with tens to hundreds of cores, it is often necessary to exploit fine-grain parallelism.  ...  Acknowledgements We sincerely thank Woongki Baek, Jacob Leverich, Anthony Romano and the anonymous reviewers for their useful feedback on earlier versions of this manuscript; the development teams of the  ... 
doi:10.1145/1735970.1736055 fatcat:yh7f7bisnnbr5apzkj5vfyjjvy
« Previous Showing results 1 — 15 out of 189 results