Filters








212 Hits in 5.5 sec

Low-synchronization translation lookaside buffer consistency in large-scale shared-memory multiprocessors

B. Rosenburg
1989 Proceedings of the twelfth ACM symposium on Operating systems principles - SOSP '89  
Operating systems for most current shared-memory multiprocessors must maintain translation lookasidc buffer (TTB) consistency across processors.  ...  consistency on some popular comtncrcial multiprocessors incur excessively high synchronization costs.  ...  can arise with that algorithm if locks arc not acquired at consistent priority levels.  ... 
doi:10.1145/74850.74864 dblp:conf/sosp/Rosenburg89 fatcat:66zbdmfnevdkda2ctznw3dzkam

Translation-lookaside buffer consistency

P.J. Teller
1990 Computer  
Also, my thanks to the other researchers who have addressed the problem of TLB consistency, without whom this paper would not be possible, and to Susan Flynn-Hummel, Bryan Rosenburg, and Edith Schonberg  ...  Three of these require virtual-address, general-purpose caches kept Shared-memory multiprocessors with multiple translation-lookaside buffers must deal with a cache consistency problem.  ...  Figure 5 . 5 A multiprocessor with hardware support for validation. Table 1 . 1 Summary of solutions to the translation-lookaside buffer consistency problem.  ... 
doi:10.1109/2.55498 fatcat:2ab6fhw32jb4niwjo2viyxupg4

Translation lookaside buffer consistency: a software approach

D. L. Black, R. F. Rashid, D. B. Golub, C. R. Hill
1989 Proceedings of the third international conference on Architectural support for programming languages and operating systems - ASPLOS-III  
We discuss the translation lookaside buffer (TLB) consistency problem for multiprocessors, and introduce the Mach shootdown algorithm for maintaining TLJ3 consistency in software.  ...  This algorithm has been implemented on several multiprocessors, and is in regular production use.  ...  Teller et al. [25] proposes algorithms for TLB consistency on large-scale multiprocessors.  ... 
doi:10.1145/70082.68193 dblp:conf/asplos/BlackRGHB89 fatcat:tuxrrdlb2ba5llw43ixsdpvfie

Translation lookaside buffer consistency: a software approach

D. L. Black, R. F. Rashid, D. B. Golub, C. R. Hill
1989 SIGARCH Computer Architecture News  
We discuss the translation lookaside buffer (TLB) consistency problem for multiprocessors, and introduce the Mach shootdown algorithm for maintaining TLJ3 consistency in software.  ...  This algorithm has been implemented on several multiprocessors, and is in regular production use.  ...  Teller et al. [25] proposes algorithms for TLB consistency on large-scale multiprocessors.  ... 
doi:10.1145/68182.68193 fatcat:3yqj25j4azeiddpo6dk2px54p4

The Synonym Lookaside Buffer: A Solution to the Synonym Problem in Virtual Caches

Xiaogang Qiu, M. Dubois
2008 IEEE transactions on computers  
To support dynamic address translation in today's microprocessors, the first-level cache is accessed in parallel with a translation lookaside buffer (TLB).  ...  It scales with memory data set sizes, physical memory sizes, and number of cores in a multiprocessor. Moreover, SLB entry flushes and shootdowns due to physical memory management are eliminated.  ...  She demonstrated that TLB consistency scales poorly in large-scale multiprocessors and moving the TLB to memory can radically solve the problem.  ... 
doi:10.1109/tc.2008.108 fatcat:c6lyqm76njblhbqtukzfmsmtfq

RPM: a rapid prototyping engine for multiprocessor systems

L.A. Barroso, S. Iman, M. Dubois, K. Ramamurthy
1995 Computer  
and stores in shared-memory machines.  ...  In sharedmemory systems, the large latencies of loads and stores on shared data is also a problem, which is usually solved by complex shared-memory access mechanisms.  ...  In particular, we want to thank Per Stenström from Lund University (Sweden), Massoud  ... 
doi:10.1109/2.347997 fatcat:zss5eszuffhzxknjduo5szw7cu

The M-Machine multicomputer

M. Fillo, S.W. Keckler, W.J. Dally, N.P. Carter, A. Chang, Y. Gurevich, W.S. Lee
1995 Proceedings of the 28th Annual International Symposium on Microarchitecture  
A user accessible message passing system yields fast communication and synchronization between nodes.  ...  Rapid access to remote memory is provided transparently to the user with a combination of hardware and software mechanisms.  ...  The external memory interface consists of the SDRAM controller and a local translation lookaside buffer (LTLB) used to cache local page external memory.  ... 
doi:10.1109/micro.1995.476822 dblp:conf/micro/FilloKDCCGL95 fatcat:o523frqb7jdrjmj7vfprfk6kk4

The M-machine multicomputer

Marco Fillo, Stephen W. Keckler, William J. Dally, Nicholas P. Carter, Andrew Chang, Yevgeny Gurevich, Whay S. Lee
1997 International journal of parallel programming  
A user accessible message passing system yields fast communication and synchronization between nodes.  ...  Rapid access to remote memory is provided transparently to the user with a combination of hardware and software mechanisms.  ...  The external memory interface consists of the SDRAM controller and a local translation lookaside buffer (LTLB) used to cache local page external memory.  ... 
doi:10.1007/bf02700035 fatcat:b5utkjnigjhl5cp7ofxrhxpj7e

DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory

Carlos Villavieja, Vasileios Karakostas, Lluis Vilanova, Yoav Etsion, Alex Ramirez, Avi Mendelson, Nacho Navarro, Adrian Cristal, Osman S. Unsal
2011 2011 International Conference on Parallel Architectures and Compilation Techniques  
Translation Lookaside Buffers (TLBs) are ubiquitously used in modern architectures to cache virtual-to-physical mappings and, as they are looked up on every memory access, are paramount to performance  ...  First, we show that both TLB shootdown cost and frequency increase with the number of processors and project that softwarebased TLB shootdowns would thwart the performance of large multiprocessors.  ...  Therefore, all processors employ a Translation Lookaside Buffer (TLB), which caches address translation information in an on-chip, content-addressable memory, and thereby eliminates the need for a full  ... 
doi:10.1109/pact.2011.65 dblp:conf/IEEEpact/VillaviejaKVERMNCU11 fatcat:l7xbvqh3rnerlaa5mzbd5mqrsa

The Design of RPM: An FPGA-based Multiprocessor Emulator

K. Oner, L.A. Barroso, S. Iman, Jaeheon Jeong, K. Ramamurthy, M. Dubois
1995 Third International ACM Symposium on Field-Programmable Gate Arrays  
In addition, improvements in Computer-Aided Design (CAD) tools, mainly in synthesis tools, greatly simplify the design of large circuits.  ...  For cost reasons, the use of FPGAs in RPM is limited to the memory controllers, while the rest of the emulator, including the processors, memories and interconnect, is built with off-the-shelf components  ...  The Processor and its First-Level Cache As a first-level cache (FLC) memory RAM1 is divided into five parts (see Fig. 2 ): the data memory (up to 1 Mbytes), the cache directory, the TLB (Translation Lookaside  ... 
doi:10.1109/fpga.1995.241946 fatcat:zatkxdrde5amhf7oa5olkgoboy

Disco

Edouard Bugnion, Scott Devine, Mendel Rosenblum
1997 ACM SIGOPS Operating Systems Review  
In this paper we examine the problem of extending modem operating systems to run efficiently on large-scale shared memory multiprocessors without a large implementation effort.  ...  code and the file system buffer cache.  ...  Our colleagues Kinshuk Govil, Dan Teodosiu, and Ben Verghese participated in many lively discussions on Disco and carefully read drafts of the paper.  ... 
doi:10.1145/269005.266672 fatcat:uvcwdv63yjgqbaapjkfxbzs474

Delayed consistency and its effects on the miss rate of parallel programs

Michel Dubois, Jin Chin Wang, Luiz A. Barroso, Kangwoo Lee, Yung-Syau Chen
1991 Proceedings of the 1991 ACM/IEEE conference on Supercomputing - Supercomputing '91  
In cache based multiprocessors a protocol must maintain coherence among replicated copies of shared writable data.  ...  In this paper, we introduce several implementations of delayed consistency for cache-based systems in the framework of a weaklyordered consistency model.  ...  Acknowlegments The idea of the stale state in the cache to implement partial invalidations of blocks is due to Andrew Glew.  ... 
doi:10.1145/125826.125941 dblp:conf/sc/DuboisWBLC91 fatcat:ar5yqcyvlzdxfncy2bqhuitd2e

The design of RPM

Koray Öner, Luiz A. Barroso, Sasan Iman, Jaeheon Jeong, Krishnan Ramamurthy, Michel Dubois
1995 Proceedings of the 1995 ACM third international symposium on Field-programmable gate arrays - FPGA '95  
In addition, improvements in Computer-Aided Design (CAD) tools, mainly in synthesis tools, greatly simplify the design of large circuits.  ...  For cost reasons, the use of FPGAs in RPM is limited to the memory controllers, while the rest of the emulator, including the processors, memories and interconnect, is built with off-the-shelf components  ...  The Processor and its First-Level Cache As a first-level cache (FLC) memory RAM1 is divided into five parts (see Fig. 2 ): the data memory (up to 1 Mbytes), the cache directory, the TLB (Translation Lookaside  ... 
doi:10.1145/201310.201321 dblp:conf/fpga/OnerBIJRD95 fatcat:lthonhjjnfceromaf5hetrfvba

The GPU Computing Era

John Nickolls, William J Dally
2010 IEEE Micro  
It provides a 40-bit virtual address space to each application context and maps it to the physical address space with translation lookaside buffers and page tables.  ...  ECC memory Fermi introduces ECC memory protection to enhance data integrity in large-scale GPU computing systems.  ... 
doi:10.1109/mm.2010.41 fatcat:tmcgmo7v5zasbpakpqk37anni4

Parallelization of a dynamic unstructured application using three leading paradigms

Leonid Oliker, Rupak Biswas
1999 Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99  
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures.  ...  We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2000, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2000, and a  ...  Single-processor cache performance and translation lookaside buffer (TLB) reuse are extremely poor.  ... 
doi:10.1145/331532.331571 dblp:conf/sc/OlikerB99 fatcat:3fbdskwh3bb3dnpwm73zag4xlm
« Previous Showing results 1 — 15 out of 212 results