Filters








87 Hits in 4.5 sec

Comparing the effectiveness of fine-grain memory caching against page migration/replication in reducing traffic in DSM clusters

An-Chow Lai, Babak Falsafi
2000 Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures - SPAA '00  
In this paper, we compare and contrast two techniques to improve capacity/conflict miss traffic in CC-NUMA DSM clusters.  ...  In this paper, we compare and contrast page migration/replication and R-NUMA on simulated clusters of symmetric multiprocessors executing shared-memory applications.  ...  Base CC-NUMA DSM Cluster In this paper, we only consider fast and small SRAM-based block caches. Alternatively, some designs incorporate large but slow DRAM-based block caches [17, 2, 21] .  ... 
doi:10.1145/341800.341811 dblp:conf/spaa/LaiF00 fatcat:xpdbdm5wmfcctazvqt2h476h5u

Optimizing Traffic in DSM Clusters: Fine-Grain Memory Caching versus Page Migration/ Replication

An-Chow Lai, Babak Falsafi
2002 Theory of Computing Systems  
In this paper, we compare and contrast two techniques to improve capacity/conflict miss traffic in CC-NUMA DSM clusters.  ...  In this paper, we compare and contrast page migration/replication and R-NUMA on simulated clusters of symmetric multiprocessors executing shared-memory applications.  ...  Base CC-NUMA DSM Cluster In this paper, we only consider fast and small SRAM-based block caches. Alternatively, some designs incorporate large but slow DRAM-based block caches [17, 2, 21] .  ... 
doi:10.1007/s00224-002-1054-6 fatcat:dc7x2u6svzh55of3u2s3s6reim

The preliminary evaluation of MBP-light with two protocol policies for a massively parallel processor-JUMP-1

I. Hiroaki, K. Anjo, J. Yamamoto, J. Tanabe, M. Wakabayashi, M. Sato, H. Amano, K. Hiraki
1999 Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation  
From results of its simulation, it appears that simple operations like the tag check and the collection/generation of acknowledgment packets are mostly processed by the hardware mechanisms in MBP-light  ...  A massively parallel processor called JUMP-1 has been developed to build an efficient cache coherent-distributed shared memory (DSM) on a large system with more than 1000 processors.  ...  A part of this research was supported by the Grant-in-Aid for Scientific Research on Priority Areas, #04235130, from the Ministry of Education, Science and Culture.  ... 
doi:10.1109/fmpc.1999.750609 fatcat:j4g5uwo5fnezxmuwoeewqz3tku

Rhymes: A shared virtual memory system for non-coherent tiled many-core architectures

King Tin Lam, Jinghao Shi, Dominic Hung, Cho-Li Wang, Zhiquan Lai, Wangbin Zhu, Youliang Yan
2014 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)  
We implement and test Rhymes on the SCC port of the Barrelfish OS.  ...  A cluster-on-chip architecture, as exemplified by the Intel Single-chip Cloud Computer (SCC), promotes a software-oriented approach instead of hardware support to implementing shared memory coherence.  ...  Special thanks go to Intel China Center of Parallel Computing (ICCPC) and Beijing Soft Tech Technologies Co., Ltd. for their kind support of the SCC platform in their Wuxi data centers for this work.  ... 
doi:10.1109/padsw.2014.7097807 dblp:conf/icpads/LamSHWLZY14 fatcat:sadkzvqywjepzenwnp32nr65fi

Hardware support for flexible distributed shared memory

S.K. Reinhardt, R.W. Pfile, D.A. Wood
1998 IEEE transactions on computers  
In addition, the custom protocols are generally effective at reducing the impact of other overheads, including those due to less aggressive hardware support and larger network latencies.  ...  To explore the interaction between these approaches, we simulated four designs that add DSM acceleration hardware to a collection of off-the-shelf workstation nodes.  ...  Sufficiently high network overheads make any hardware DSM support superfluous, leaving software-only fine-grain [50, 48] or pagebased [26] systems as the most cost-effective approaches to DSM in this  ... 
doi:10.1109/12.729790 fatcat:ybyu26st3vbl5mbnytscnzzxlq

MIND: In-Network Memory Management for Disaggregated Data Centers [article]

Seung-seob Lee, Yanpeng Yu, Yupeng Tang, Anurag Khandelwal, Lin Zhong, Abhishek Bhattacharjee
2021 arXiv   pre-print
We find that centralizing memory management in the network permits bandwidth and latency-efficient realization of in-network cache coherence protocols, while programmable switch ASICs support other memory  ...  However, existing designs achieve performance at the cost of resource elasticity, restricting memory sharing to a single compute blade to avoid costly memory coherence traffic over the network.  ...  Gomez and Jonathan Kraft for helpful inputs throughout the work. This work is supported in part by NSF Awards #2047220, #2016422, #1916817 and their REU supplements.  ... 
arXiv:2107.00164v1 fatcat:jzktsu2ygfduxhdrbd66wl7j3a

Database architecture evolution

Stefan Manegold, Martin L. Kersten, Peter Boncz
2009 Proceedings of the VLDB Endowment  
, to service a broad user community, Small & Simple, to be comprehensible to a small team of programmers, Self-managing, to let it run out-of-the-box without hassle.  ...  The holy grail for database architecture research is to find a solution that is Scalable & Speedy, to run on anything from small ARM processors up to globally distributed compute clusters, Stable & Secure  ...  We wish to thank all (former) members of the CWI database group for their effort, inspiration and dedication to make MonetDB a success.  ... 
doi:10.14778/1687553.1687618 fatcat:az3viqkxx5hf7gmu5zoih67hgq

Decoupled hardware support for distributed shared memory

Steven K. Reinhardt, Robert W. Pfile, David A. Wood
1996 SIGARCH Computer Architecture News  
This paper investigates hardware support for fine-grain distributed shared memory (DSM) in networks of workstations.  ...  To reduce design time and implementation cost relative to dedicated DSM systems, we decouple the functional hardware components of DSM support, allowing greater use of off-the-shelf devices.  ...  Babak Falsafi and Shubu Mukherjee contributed to the development of the simulator used in this paper. Mark Hill and Jim Larus provided valuable comments on drafts of this paper.  ... 
doi:10.1145/232974.232979 fatcat:xlvt3pco3vakrhow3fr5taaefy

Decoupled hardware support for distributed shared memory

Steven K. Reinhardt, Robert W. Pfile, David A. Wood
1996 Proceedings of the 23rd annual international symposium on Computer architecture - ISCA '96  
This paper investigates hardware support for fine-grain distributed shared memory (DSM) in networks of workstations.  ...  To reduce design time and implementation cost relative to dedicated DSM systems, we decouple the functional hardware components of DSM support, allowing greater use of off-the-shelf devices.  ...  Babak Falsafi and Shubu Mukherjee contributed to the development of the simulator used in this paper. Mark Hill and Jim Larus provided valuable comments on drafts of this paper.  ... 
doi:10.1145/232973.232979 dblp:conf/isca/ReinhardtPW96 fatcat:3brtvx2ihjcczkrrqbw7qj6wra

The SGI Origin

James Laudon, Daniel Lenoski
1997 Proceedings of the 24th annual international symposium on Computer architecture - ISCA '97  
of up to 5 1 2 nodes interconnected by a scalable Craylink network.  ...  In addition, performance results are presented for the NAS Parallel Benchmarks V 2 . 2 and the SPLASH2 applications.  ...  Acknowledgments The Origin system design resulted from the very hard work of a top-notch team of chip, board, and system engineers.  ... 
doi:10.1145/264107.264206 dblp:conf/isca/LaudonL97 fatcat:znqulcfe4rafzetvq46e76x3ji

Reactive NUMA

Babak Falsafi, David A. Wood
1997 Proceedings of the 24th annual international symposium on Computer architecture - ISCA '97  
We then use detailed execution-driven simulation to show that, in practice, R-NUMA usually performs better than either a pure CC-NUMA or pure S-COMA protocol, and no more than 57% worse than the best of  ...  We first show the theoretical result that R-NUMA's worst-case performance is bounded within a small constant factor (i.e., two to three times) of the best of CC-NUMA and S-COMA.  ...  Acknowledgements We would like to thank Steve Reinhardt for helping with the development of our simulator, Beng-Hong Lim and Sandra Irani for their comments on our performance models, and Scott Breach,  ... 
doi:10.1145/264107.264205 dblp:conf/isca/FalsafiW97 fatcat:xdrodc2f6rhgbpr7shsx5ookk4

The MIT Alewife Machine

A. Agarwal, R. Bianchini, D. Chaiken, F.T. Chong, K.L. Johnson, D. Kranz, J.D. Kubiatowicz, Beng-Hong Lim, K. Mackenzie, D. Yeung
1999 Proceedings of the IEEE  
By using a combination of hardware and software mechanisms, DSM combines the nice features of all the above models and is able to achieve both the scalability of messagepassing machines and the programmability  ...  Alewife supports up to 512 processing nodes connected over a scalable and cost-effective mesh network at a constant cost per node.  ...  The Alewife machine was built in cooperation with LSI Logic, Inc., Sun Microsystems, Inc., and the Information Sciences Institute at University of Southern California.  ... 
doi:10.1109/5.747864 fatcat:6ebg346wnzcqxa22ayhdrmpdni

Active memory operations

Zhen Fang, Lixin Zhang, John B. Carter, Ali Ibrahim, Michael A. Parker
2007 Proceedings of the 21st annual international conference on Supercomputing - ICS '07  
Our implementation of AMOs is cache-coherent and requires no changes to the processor core or DRAM chips.  ...  To mitigate this problem, we propose the use of Active Memory Operations (AMOs), in which select operations can be sent to and executed on the home memory controller of data.  ...  When a process accesses data that is not in a local cache, the local DSM hardware sends a message to the data's home node to request a copy.  ... 
doi:10.1145/1274971.1275004 dblp:conf/ics/FangZCIP07 fatcat:ajzlsvdgorezbk6isb6nnlno24

Rapid hardware prototyping on RPM-2

M. Dubois, Jaeheon Jeong, Yong Ho Song, A. Moga
1998 IEEE Design & Test of Computers  
Because of its flexibility, the RPM hardware can adapt during its lifetime to the rapid evolution of technology trade-offs and new architectural ideas.  ...  Although some of these ideas have been prototyped in hardware, hardware prototypes take too long to build and are very expensive. Often, by the time a hardware prototype really works, it is obsolete.  ...  In addition, Luiz Barroso, Koray Oner, Jacqueline Chame, Sasan Iman, and Krishnan Ramamurthy participated in the design of RPM.  ... 
doi:10.1109/54.706042 fatcat:7d4o3lvjvbg3zne6w6hgepmodq

Cache-only memory architectures

F. Dahlgren, J. Torrellas
1999 Computer  
Effectively,  each shared-memory module acts as a huge cache mem- ory, giving the name COMA to the architecture.  ...  Unlike in a conventional  CC-NUMA architecture, in a COMA, every shared- memory module in the machine is a cache, where each  memory line has a tag with the line's address and state.  As a processor  ...  Moga A, Dubois M () The effectiveness of SRAM network  caches in clustered DSMs. In: International symposium on high- performance computer architecture, Las Vegas, February   .  ... 
doi:10.1109/2.769448 fatcat:ozadfmvoyne5jczmnm42x4ukza
« Previous Showing results 1 — 15 out of 87 results