Filters








47 Hits in 5.4 sec

DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory

Carlos Villavieja, Vasileios Karakostas, Lluis Vilanova, Yoav Etsion, Alex Ramirez, Avi Mendelson, Nacho Navarro, Adrian Cristal, Osman S. Unsal
2011 2011 International Conference on Parallel Architectures and Compilation Techniques  
We then present a scalable architectural mechanism that couples a shared TLB directory with load/store queue support for lightweight TLB invalidation, and thereby eliminates the need for costly IPIs.  ...  The emergence of chipmultiprocessors (CMPs) with per-core TLBs, has brought the problem of TLB coherence to front stage. TLBs are kept coherent at the software-level by the operating system (OS).  ...  Special thanks to the members of the Heterogeneous Architectures group at BSC and the anonymous reviewers for their comments and suggestions.  ... 
doi:10.1109/pact.2011.65 dblp:conf/IEEEpact/VillaviejaKVERMNCU11 fatcat:l7xbvqh3rnerlaa5mzbd5mqrsa

Architectural support for address translation on GPUs

Bharath Pichai, Lisa Hsu, Abhishek Bhattacharjee
2014 Proceedings of the 19th international conference on Architectural support for programming languages and operating systems - ASPLOS '14  
We show the challenges posed by GPU warp schedulers on TLBs accessed in parallel with L1 caches, which provide many well-known programmability benefits.  ...  In response, we propose modest TLB and PTW augmentations that recover most of the performance lost by introducing L1-parallel TLB access.  ...  Acknowledgments We thank the anonymous reviewers for their feedback on this work. This material is based upon work supported by the National Science Foundation under Grant No. 1337147 and 1253700.  ... 
doi:10.1145/2541940.2541942 dblp:conf/asplos/PichaiHB14 fatcat:na5uky2qbzdfdkjjlnkmojuudm

RadixVM

Austin T. Clements, M. Frans Kaashoek, Nickolai Zeldovich
2013 Proceedings of the 8th ACM European Conference on Computer Systems - EuroSys '13  
RadixVM is a new virtual memory system design that enables fully concurrent operations on shared address spaces for multithreaded processes on cache-coherent multicore computers.  ...  counting scheme; and 3) it uses a new scheme to target remote TLB shootdowns and to often avoid them altogether.  ...  ACKNOWLEDGMENTS We thank Silas Boyd-Wickizer, Yandong Mao, Xi Wang, the anonymous reviewers, and our shepherd, Miguel Castro, for their feedback.  ... 
doi:10.1145/2465351.2465373 dblp:conf/eurosys/ClementsKZ13 fatcat:xouazknbfnfgtkyhba5kic3cyu

Architectural and Operating System Support for Virtual Memory

Abhishek Bhattacharjee, Daniel Lustig
2017 Synthesis Lectures on Computer Architecture  
We thank her for her support in pursuing our research endeavors. We also thank the many collaborators with whom we have explored various topics pertaining to virtual memory.  ...  Thank you also to Trey Cain, Derek Hower, Lisa Hsu, Aamer Jaleel, Yatin Manerkar, Michael Pellauer, and Caroline Trippel for the countless helpful discussions about virtual memory and memory system behavior  ...  TLBs located closer to the core generally aim to keep the common case of a hit as low-latency as possible, while TLBs located farther from the core generally aim to mitigate some of the cost of TLB misses  ... 
doi:10.2200/s00795ed1v01y201708cac042 fatcat:4re5afn53jhu7ezxwtb25ja3ca

Improving Multi-Application Concurrency Support Within the GPU Memory System [article]

Rachata Ausavarungnirun, Christopher J. Rossbach, Vance Miller, Joshua Landgraf, Saugata Ghose, Jayneel Gnadhi, Adwait Jog, Onur Mutlu
2017 arXiv   pre-print
We introduce MASK, a memory hierarchy design that provides low-overhead virtual memory support for the concurrent execution of multiple applications.  ...  We find that when multiple applications spatially share the GPU, there is a significant amount of inter-core thrashing on the shared TLB within the GPU.  ...  Area and Power Consumption. We compare the area and power consumption of MASK using CACTI [59] . We compare the area and power of the L1 TLB, L2 TLB, the shared data cache and the page walk cache.  ... 
arXiv:1708.04911v1 fatcat:s3bnlapcebhblj6sawi22uc3ry

Inter-core cooperative TLB for chip multiprocessors

Abhishek Bhattacharjee, Margaret Martonosi
2010 Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems - ASPLOS '10  
With the growing dominance of chip multiprocessors (CMPs), it is necessary to examine TLB performance in the context of parallel workloads.  ...  This work is the first to present TLB prefetchers that exploit commonality in TLB miss patterns across cores in CMPs.  ...  We also thank Li-Shiuan Peh for her suggestions on improving the quality of our submission and Chris Bienia for his help with understanding the PARSEC workloads.  ... 
doi:10.1145/1736020.1736060 dblp:conf/asplos/BhattacharjeeM10 fatcat:hyxkdp3q5famjbelkz5hukc3km

Your computer is already a distributed system. Why isn't your OS?

Andrew Baumann, Simon Peter, Adrian Schüpbach, Akhilesh Singhania, Timothy Roscoe, Paul Barham, Rebecca Isaacs
2009 USENIX Workshop on Hot Topics in Operating Systems  
Processor caches and TLBs replicate data in hardware for performance, with the OS sometimes handling consistency as with TLB shootdown.  ...  Increasingly sophisticated power management allows cores, memory systems, and peripheral devices to be put into a variety of low-power states, with important implications for how the OS functions: if a  ... 
dblp:conf/hotos/BaumannPSSRBI09 fatcat:y552cm2q5fdtfdrgigf2zqgqgq

BarTLB: Barren page resistant TLB for managed runtime languages

Xin Tong, Andreas Moshovos
2014 2014 IEEE 32nd International Conference on Computer Design (ICCD)  
(b) Selective In-Cache Translation Caching (SICTC) avoids installing barren pages in the TLB by augmenting one way of a virtually-indexed, physically-tagged L1 data cache with virtual tags.  ...  This work also proposes (1) a low-cost barren page identification technique, and (2) two simple, low-cost techniques for improving TLB performance: (a) The Barren Page First (BPF) replacement policy extends  ...  levels of cache hierarchy that may be representative of power-optimized processor cores.  ... 
doi:10.1109/iccd.2014.6974692 dblp:conf/iccd/TongM14 fatcat:horikwucqjblxj52idhe37yhsq

Reactive NUCA

Nikos Hardavellas, Michael Ferdman, Babak Falsafi, Anastasia Ailamaki
2009 Proceedings of the 36th annual international symposium on Computer architecture - ISCA '09  
R-NUCA cooperates with the operating system to support intelligent placement, migration, and replication without the overhead of an explicit coherence mechanism for the on-chip last-level cache.  ...  Increases in on-chip communication delay and the large working sets of server and scientific workloads complicate the design of the on-chip last-level cache for multicore processors.  ...  Somogyi for their technical assistance, and T. Brecht, T. Strigkos, and the anonymous reviewers for their feedback on earlier drafts of this paper.  ... 
doi:10.1145/1555754.1555779 dblp:conf/isca/HardavellasFFA09 fatcat:326qapu44fd47o5dt3qm7ghbgy

Translation Leak-aside Buffer: Defeating Cache Side-channel Protections with TLB Attacks

Ben Gras, Kaveh Razavi, Herbert Bos, Cristiano Giuffrida
2018 USENIX Security Symposium  
Acknowledgements The authors would like to thank the anonymous reviewers for their thoughtful feedback.  ...  We would also like to thank Colin Percival, Yuval Yarom, and Taylor 'Riastradh' Campbell for feedback on early versions of this paper.  ...  ., L1), consists of two parts, one that caches translations for code pages, called L1 instruction TLB (L1 iTLB), and one that caches translations for data pages, called L1 data TLB (L1 dTLB).  ... 
dblp:conf/uss/GrasRBG18 fatcat:wuxg6ilndnhdppn4kbekyjjwni

2018 Index IEEE Computer Architecture Letters Vol. 17

2019 IEEE computer architecture letters  
-June 2018 80-83 TLB Shootdown Mitigation for Low-Power Many-Core Servers with L1 Virtual Caches. Pham, B., þ, LCA Jan.  ...  -June 2018 80-83 Park, S., see Min, D., LCA July -Dec. 2018 245-248 Pham, B., Hower, D., Bhattacharjee, A., and Cain, T., TLB Shootdown Mitigation for Low-Power Many-Core Servers with L1 Virtual Caches  ... 
doi:10.1109/lca.2019.2901240 fatcat:ofxkmrips5ezte6rljageeen34

Techniques for Shared Resource Management in Systems with Throughput Processors [article]

Rachata Ausavarungnirun
2018 arXiv   pre-print
We introduce changes to the memory hierarchy for systems with GPUs that allow the memory hierarchy to be aware of both CPU and GPU applications' characteristics.  ...  We propose changes to the cache management and memory scheduling mechanisms to mitigate intra-application interference in GPGPU applications.  ...  Onur Mutlu, for providing me with great research environment. He taught me many important aspects of research and shaped me into the researcher I am today.  ... 
arXiv:1803.06958v1 fatcat:3mqbwegpkvdrpk6sqwb3ooyh7e

Efficient virtual memory for big memory servers

Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, Michael M. Swift
2013 SIGARCH Computer Architecture News  
Our analysis shows that many "big-memory" server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory.  ...  To remove the TLB miss overhead for big-memory workloads, we propose mapping part of a process's linear virtual address space with a direct segment, while page mapping the rest of the virtual address space  ...  We thank Wisconsin Computer Architecture Affiliates for their feedback on an early version of the work. We thank Richardson Addai-Mununkum for proof-reading our drafts.  ... 
doi:10.1145/2508148.2485943 fatcat:vix4kkpe5veefmas7inuv72uay

Efficient virtual memory for big memory servers

Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, Michael M. Swift
2013 Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13  
Our analysis shows that many "big-memory" server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory.  ...  To remove the TLB miss overhead for big-memory workloads, we propose mapping part of a process's linear virtual address space with a direct segment, while page mapping the rest of the virtual address space  ...  We thank Wisconsin Computer Architecture Affiliates for their feedback on an early version of the work. We thank Richardson Addai-Mununkum for proof-reading our drafts.  ... 
doi:10.1145/2485922.2485943 dblp:conf/isca/BasuGCHS13 fatcat:2p7dghs7g5axrn7dh2tttcufoe

Partially Separated Page Tables for Efficient Operating System Assisted Hierarchical Memory Management on Heterogeneous Architectures

B. Gerofi, A. Shimada, A. Hori, Y. Ishikawa
2013 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing  
Heterogeneous architectures, where a multicore processor is accompanied with a large number of simpler, but more power-efficient CPU cores optimized for parallel workloads, are receiving a lot of attention  ...  CPU core in TLB invalidation only if it is absolutely necessary.  ...  We would like to express our gratitude to Intel Japan for providing the hardware, software and technical support associated with the Intel R ⃝ Xeon Phi TM product family.  ... 
doi:10.1109/ccgrid.2013.59 dblp:conf/ccgrid/GerofiSHI13 fatcat:db6bcbjxpvhdflg7eh254tc5pa
« Previous Showing results 1 — 15 out of 47 results