Filters








12 Hits in 6.5 sec

The implementation of a coherent memory abstraction on a NUMA multiprocessor: experiences with platinum

A. Cox, R. Fowler
1989 ACM SIGOPS Operating Systems Review  
Coherent memory makes programming NUMA multiprocessors easier for the user while attaining a level of performance comparable with hand-tuned programs.  ...  PLATINUM is an operating system kernel with a novel memory management system for Non-Uniform Memory Access (NUMA) multiprocessor architectures.  ...  We thank the referees and Hugh Lauer for their constructive comments. Niki Fowler deserves special credit for her editorial assistance on the revised version of this paper.  ... 
doi:10.1145/74851.74855 fatcat:t6czscutbvgshgtdt7ui4t6eoy

Evaluation of multiprocessor memory systems using off-line optimal behavior

William J. Bolosky, Michael L. Scott
1992 Journal of Parallel and Distributed Computing  
In recent years, much effort has been devoted to analyzing the performance of distributed memory systems for multiprocessors.  ...  Such systems usually consist of a set of memories or caches, some device such as a bus or switch to connect the memories and processors, and a policy for determining when to put which addressable objects  ...  Most of our applications were provided by others: in addition to the PLATINUM C-Threads applications from Rob and Alan, the Presto applications came from the Munin group at Rice University; the SPLASH  ... 
doi:10.1016/0743-7315(92)90051-n fatcat:esusar3u4ndchnupc7p5m2dduy

Exploiting operating system support for dynamic page placement on a NUMA shared memory multiprocessor

Richard P. LaRowe, James T. Wilkes, Carla S. Ellis
1991 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '91  
The implementation of a coherent memory abstraction on a NUMA multi- processor: Experiences with Platinum.  ...  Scalable shared memory multiprocessors, on the other hand, tend to present at least some degree of non-uniformity of memory access to the programmer, making the NUMA class an important one to consider.  ... 
doi:10.1145/109625.109639 dblp:conf/ppopp/LaRoweWE91 fatcat:zqgn6u57rndjnm22jfv3crrioe

NUMA policies and their relation to memory architecture

William J. Bolosky, Michael L. Scott, Robert P. Fitzgerald, Robert J. Fowler, Alan L. Cox
1991 ACM SIGOPS Operating Systems Review  
Multiprocessor memory reference traces provide a wealth of information on the behavior of parallel programs.  ...  We compare the performance of our optimal policy with that of three implementable policies (two of which appear in previous work), on a variety of applications, with varying relative speeds for page moves  ...  Because Presto was originally implemented on a Sequent Symmetry, a coherent cache machine, its applications were written without consideration of NUMA memory issues.  ... 
doi:10.1145/106974.106994 fatcat:54vlt7bddzd6nkcoit7qlgd6wm

NUMA policies and their relation to memory architecture

William J. Bolosky, Michael L. Scott, Robert P. Fitzgerald, Robert J. Fowler, Alan L. Cox
1991 SIGPLAN notices  
Multiprocessor memory reference traces provide a wealth of information on the behavior of parallel programs.  ...  We compare the performance of our optimal policy with that of three implementable policies (two of which appear in previous work), on a variety of applications, with varying relative speeds for page moves  ...  Because Presto was originally implemented on a Sequent Symmetry, a coherent cache machine, its applications were written without consideration of NUMA memory issues.  ... 
doi:10.1145/106973.106994 fatcat:gycerao7sjdc3o2jkn6pdudg2q

NUMA policies and their relation to memory architecture

William J. Bolosky, Michael L. Scott, Robert P. Fitzgerald, Robert J. Fowler, Alan L. Cox
1991 Proceedings of the fourth international conference on Architectural support for programming languages and operating systems - ASPLOS-IV  
Multiprocessor memory reference traces provide a wealth of information on the behavior of parallel programs.  ...  We compare the performance of our optimal policy with that of three implementable policies (two of which appear in previous work), on a variety of applications, with varying relative speeds for page moves  ...  Because Presto was originally implemented on a Sequent Symmetry, a coherent cache machine, its applications were written without consideration of NUMA memory issues.  ... 
doi:10.1145/106972.106994 dblp:conf/asplos/BoloskySFFC91 fatcat:cf3yu7o5cbczvnannrxmuryyya

NUMA policies and their relation to memory architecture

William J. Bolosky, Michael L. Scott, Robert P. Fitzgerald, Robert J. Fowler, Alan L. Cox
1991 SIGARCH Computer Architecture News  
Multiprocessor memory reference traces provide a wealth of information on the behavior of parallel programs.  ...  We compare the performance of our optimal policy with that of three implementable policies (two of which appear in previous work), on a variety of applications, with varying relative speeds for page moves  ...  Because Presto was originally implemented on a Sequent Symmetry, a coherent cache machine, its applications were written without consideration of NUMA memory issues.  ... 
doi:10.1145/106975.106994 fatcat:h64q7qubqncobofb7zqq75f6eu

A comprehensive bibliography of distributed shared memory

M. Rasit Eskicioglu
1996 ACM SIGOPS Operating Systems Review  
In the past decade, a popular research topic has been the design of systems to provide the shared memory abstraction on physically distributed memory machines.  ...  DSM has been implemented both in software (e.g., to provide the shared memory programming model on networks of workstations) and in hardware (e.g., using cache consistency protocols to support shared memory  ...  [Cox and Fowler 1989] Cox, A. L. and Fowler, R. J. The Implementation of a Coherent Memory Abstrac- tion on a NUMA Multiprocessor: Experiences with PLATINUM.  ... 
doi:10.1145/218646.218651 fatcat:ildcgoxumvheharepblsrqm5ui

Multigrain shared memory

Donald Yeung, John Kubiatowicz, Anant Agarwal
2000 ACM Transactions on Computer Systems  
This paper introduces the design of a shared memory system that uses multiple granularities of sharing, called MGS, and presents a prototype implementation of MGS on the MIT Alewife multiprocessor.  ...  Multigrain shared memory enables the collaboration of hardware and software shared memory, thus synthesizing a single transparent shared memory address space across a cluster of multiprocessors.  ...  The Alewife Multiprocessor Alewife is a distributed memory multiprocessor that supports the shared memory abstraction in hardware.  ... 
doi:10.1145/350853.350871 fatcat:s32qyjg7wra7jc6iho426iczra

Software cache coherence for large scale multiprocessors

L.I. Kontothanassis, M.L. Scott
Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture  
This protocol runs on NCC-NUMA 1 machines, in which a global physical address space allows processors to ll cache lines from remote memory.  ...  It must be implemented with caches in order to perform well, however, and caches require a coherence mechanism to ensure that processors reference current data.  ...  Acknowledgements Our thanks to Ricardo Bianchini and Jack Veenstra for the long nights of discussions, idea exchanges and suggestions that helped make this paper possible.  ... 
doi:10.1109/hpca.1995.386534 dblp:conf/hpca/KontothanassisS95 fatcat:rk5qkw7zzzbrhdnzasjlxawyqe

Single system image: A survey

Philip Healy, Theo Lynn, Enda Barrett, John P. Morrison
2016 Journal of Parallel and Distributed Computing  
A survey of implementation techniques is provided along with relevant examples. Notable deployments are examined and insights gained from hands-on experience are summarized.  ...  computing resources are aggregated and presented via an interface that maintains the illusion of interaction with a single system.  ...  Acknowledgments The authors wish to thank Ian Lee for helping to gather the data depicted in Fig. 2 .  ... 
doi:10.1016/j.jpdc.2016.01.004 fatcat:6eqzcmrmu5eptmktzn52i7yy7e

Optimisation of computational fluid dynamics applications on multicore and manycore architectures

Ioan Hadade, Luca Di Mare, William Jones, Engineering And Physical Sciences Research Council, Rolls-Royce Group Plc
2019
The implementation of all of these optimisations led to application speed-ups ranging between 2.7X and 3X on the multicore CPUs and 5.7X to 24X on the manycore processors.  ...  On the manycore architectures, running more than one thread per physical core is found to be crucial for good performance on processors with in-order core designs but not required on out-of-order architectures  ...  It is therefore recommended that some form of abstraction is implemented with regards to the layout in memory of such data structures so that a switch between different implementations can be performed  ... 
doi:10.25560/67278 fatcat:qgrck6ou4vfv7eqschghxnxsd4