Filters








1,192 Hits in 7.8 sec

Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches

Manu Awasthi, Kshitij Sudan, Rajeev Balasubramonian, John Carter
2009 2009 IEEE 15th International Symposium on High Performance Computer Architecture  
These mechanisms allow the hardware and OS to dynamically manage cache capacity per thread as well as optimize placement of data shared by multiple threads.  ...  The key innovation is the use of a shadow address space to allow hardware control of data placement in the L2 cache while being largely transparent to the user application and offchip world.  ...  This allows the hardware to dynamically change page color and page placement without actually copying the page in physical memory or impacting the OS physical memory management policies.  ... 
doi:10.1109/hpca.2009.4798260 dblp:conf/hpca/AwasthiSBC09 fatcat:bxzvguck3jgijf5zvnvzh5qdia

Survey of Memory Management Techniques for HPC and Cloud Computing

Anna Pupykina, Giovanni Agosta
2019 IEEE Access  
However, for this scenario to succeed in practice, resources, including memory, need to be allocated with a vision that includes both the application requirements and the current and future state of the  ...  Traditionally, memory is shared by an operating system using segmentation and paging techniques. At the same time, new classes of applications require Quality of Service (QoS) guarantees.  ...  AGAS allows performing hardware-controlled dynamic mapping, thanks to the fact that distributed shared memory systems create logical shared address space using distributed memory hardware at the OS level  ... 
doi:10.1109/access.2019.2954169 fatcat:hwtpltrdrffqdjdofhr3shjkla

Enhancing Programmability, Portability, and Performance with Rich Cross-Layer Abstractions [article]

Nandita Vijaykumar
2019 arXiv   pre-print
the interfaces and abstractions between the application and the underlying system/hardware--specifically, the hardware-software interface.  ...  In doing so, they enable a rich space of hardware-software cooperative mechanisms to optimize for performance.  ...  , memory placement, cache management, and prefetching.  ... 
arXiv:1911.05660v1 fatcat:w5f3g4isqbcphm2jjfzjtvrjnq

Tempest and typhoon

S. K. Reinhardt, J. R. Larus, D. A. Wood
1994 SIGARCH Computer Architecture News  
First, the Stache protocol uses Tempest's finegrain access control mechanisms to manage part of a processor's local memory as a large, fully-associative cache for remote data.  ...  We simulated Typhoon on the Wisconsin Wind Tunnel and found that Stache running on Typhoon performs comparably (±30%) to an all-hardware Dir N NB cache-coherence protocol for five shared-memory programs  ...  We especially would like to thank Mark Hill for numerous discussions and suggestions and Alvy Lebeck for several key extensions to Fast-Cache.  ... 
doi:10.1145/192007.192062 fatcat:6yffl7imkzhaparrgqihwlzbfu

Tempest and typhoon

Steven K. Reinhardt, James R. Larus, David A. Wood
1998 25 years of the international symposia on Computer architecture (selected papers) - ISCA '98  
First, the Stache protocol uses Tempest's finegrain access control mechanisms to manage part of a processor's local memory as a large, fully-associative cache for remote data.  ...  We simulated Typhoon on the Wisconsin Wind Tunnel and found that Stache running on Typhoon performs comparably (±30%) to an all-hardware Dir N NB cache-coherence protocol for five shared-memory programs  ...  We especially would like to thank Mark Hill for numerous discussions and suggestions and Alvy Lebeck for several key extensions to Fast-Cache.  ... 
doi:10.1145/285930.286008 dblp:conf/isca/ReinhardtLW98a fatcat:2ggsngclordifld3ksmlfcri3m

Compiler-managed partitioned data caches for low power

Rajiv Ravindran, Michael Chu, Scott Mahlke
2007 SIGPLAN notices  
We propose a hardware/software co-managed partitioned cache architecture in which enhanced load/store instructions are used to control fine-grain data placement within a set of cache partitions.  ...  However, doing this in hardware alone is difficult due to hardware complexity, high power dissipation, overheads of dynamic discovery of application characteristics, and increased likelihood of making  ...  Acknowledgments We would like to thank Dr. Krishnan Kailas and Dr. Zehra Sura of IBM TJ Watson Research Center for the initial discussions on partitioned caches.  ... 
doi:10.1145/1273444.1254809 fatcat:icvlk2szozeaxlbqt4pmwrw77u

Compiler-managed partitioned data caches for low power

Rajiv Ravindran, Michael Chu, Scott Mahlke
2007 Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools - LCTES '07  
We propose a hardware/software co-managed partitioned cache architecture in which enhanced load/store instructions are used to control fine-grain data placement within a set of cache partitions.  ...  However, doing this in hardware alone is difficult due to hardware complexity, high power dissipation, overheads of dynamic discovery of application characteristics, and increased likelihood of making  ...  Acknowledgments We would like to thank Dr. Krishnan Kailas and Dr. Zehra Sura of IBM TJ Watson Research Center for the initial discussions on partitioned caches.  ... 
doi:10.1145/1254766.1254809 dblp:conf/lctrts/RavindranCM07 fatcat:wavoh73t6ne27nusrbyznwdxz4

Main Memory Scaling: Challenges and Solution Directions [chapter]

Onur Mutlu
2015 More than Moore Technologies for Next Generation Computer Design  
More and increasingly heterogeneous processing cores and agents/clients are sharing the memory system [6, 21, 107, 45, 46, 36, 23] , leading to increasing demand for memory capacity and bandwidth along  ...  predictable performance and QoS to applications sharing the memory system (i.e., QoS-aware memory systems).  ...  and Retention-Aware Error Management for NAND Flash Memory [12] .  ... 
doi:10.1007/978-1-4939-2163-8_6 fatcat:okw4kxakuja43kac65zy5c35ye

A Software-Managed Approach to Die-Stacked DRAM

Mark Oskin, Gabriel H. Loh
2015 2015 International Conference on Parallel Architecture and Compilation (PACT)  
The first is a hardware-assisted TLB shoot-down, which is a more general mechanism that is valuable beyond stacked DRAM, and enables OS-managed page caches to achieve a 27% speedup; the second is a software-implemented  ...  still provides APIs to the application layers to explicitly control die-stacked DRAM allocations.  ...  cache accesses and works in concert with the software (or hardware) page cache routine to insert prefetched pages.  ... 
doi:10.1109/pact.2015.30 dblp:conf/IEEEpact/OskinL15 fatcat:fn4mpetyhfhp5fven24s3iort4

Reactive NUCA

Nikos Hardavellas, Michael Ferdman, Babak Falsafi, Anastasia Ailamaki
2009 Proceedings of the 36th annual international symposium on Computer architecture - ISCA '09  
The large working sets favor a shared cache design that maximizes the aggregate cache capacity and minimizes off-chip memory requests.  ...  and by 32% at best, while achieving performance within 5% of an ideal cache design.  ...  ACKNOWLEDGEMENTS The authors would like to thank B. Gold and S. Somogyi for their technical assistance, and T. Brecht, T.  ... 
doi:10.1145/1555754.1555779 dblp:conf/isca/HardavellasFFA09 fatcat:326qapu44fd47o5dt3qm7ghbgy

Research Problems and Opportunities in Memory Systems

2014 Supercomputing Frontiers and Innovations  
memory systems), 3) providing predictable performance and QoS to applications sharing the memory system (i.e., QoS-aware memory systems).  ...  Recent system design, application, and technology trends that require more capacity, bandwidth, efficiency, and predictability out of the memory system make it an even more important system bottleneck.  ...  Acknowledgments The source code and data sets of some of the works we have discussed or alluded to (e.g., [9, 86, 97, 111, 153, 166, 196] ) are available under open source software license at our research  ... 
doi:10.14529/jsfi140302 fatcat:2zfa7zk3qjgohdsgxmkkqaamuu

A Survey on Cache Management Mechanisms for Real-Time Embedded Systems

Giovani Gracioli, Ahmed Alhammad, Renato Mancuso, Antônio Augusto Fröhlich, Rodolfo Pellizzoni
2015 ACM Computing Surveys  
In this article, we present a survey of cache management techniques for real-time embedded systems, from the first studies of the field in 1990 up to the latest research published in 2014.  ...  However, multicore processors have shared resources that affect the predictability of real-time systems, which is the key to correctly estimate the worst-case execution time of tasks.  ...  Suhendra and Mitra (this work was discussed in Section 3) were the first authors to evaluate the combination of cache partitioning and cache locking in the context of multicore real-systems [Suhendra  ... 
doi:10.1145/2830555 fatcat:nckhashqprghfnbcaqqu7vk5vi

Runtime-Assisted Cache Coherence Deactivation in Task Parallel Programs

Paul Caheny, Lluc Alvarez, Mateo Valero, Miquel Moreto, Marc Casas
2018 SC18: International Conference for High Performance Computing, Networking, Storage and Analysis  
This paper proposes a hardware/software co-designed approach: the runtime system identifies data that is guaranteed by the programming model semantics to not require coherence and notifies the microarchitecture  ...  To reduce the area and power needs of the directory, recent proposals reduce its size by classifying data as private or shared, and disable coherence for private data.  ...  The ease of programmability of shared-memory architectures is granted by hardware cache coherence, which manages the cache hierarchy transparently to the software.  ... 
doi:10.1109/sc.2018.00038 fatcat:5zfpsxgbxbf7vfg36ng2k5jg6y

Cuanta

Sriram Govindan, Jie Liu, Aman Kansal, Anand Sivasubramaniam
2011 Proceedings of the 2nd ACM Symposium on Cloud Computing - SOCC '11  
solution -the interference due to shared processor caches.  ...  In this paper, we present a practical technique for predicting performance interference due to shared processor cache which works on current processor architectures and requires minimal software changes  ...  System software has very little control over such resources and they are almost entirely managed by the hardware in a best effort fashion.  ... 
doi:10.1145/2038916.2038938 dblp:conf/cloud/GovindanLKS11 fatcat:i7cz22xxn5aohirc7luxzjwblu

Experience with building a commodity Intel-based ccNUMA system

B. C. Brock, G. D. Carpenter, E. Chiprout, M. E. Dean, P. L. De Backer, E. N. Elnozahy, H. Franke, M. E. Giampapa, D. Glasco, J. L. Peterson, R. Rajamony, R. Ravindran (+3 others)
2001 IBM Journal of Research and Development  
The system can be partitioned statically or dynamically, and uses an innovative, combined hardware/software approach to support application-level performance tuning.  ...  A different approach to building these systems is to use Standard High Volume (SHV) hardware and stock software components as building blocks and assemble them with minimal investments in hardware and  ...  Acknowledgments We are indebted to many individuals for their support and encouragement,  ... 
doi:10.1147/rd.452.0207 fatcat:uomsp7ddtfadjk4ogo3l3hlo4m
« Previous Showing results 1 — 15 out of 1,192 results