Filters








16,391 Hits in 8.3 sec

The impact of exploiting instruction-level parallelism on shared-memory multiprocessors

V.S. Pai, P. Ranganathan, H. Abdel-Shafi, S. Adve
1999 IEEE transactions on computers  
This paper evaluates the impact of such processors on the performance of shared-memory multiprocessors, both without and with the latencyhiding optimization of software prefetching.  ...  Our results suggest the need for additional latency hiding or reducing techniques for ILP systems, such as software clustering of load misses and producer-initiated communication. .  ...  Overall, our results suggest that, compared to previousgeneration shared-memory systems, ILP-based systems have a greater need for additional techniques to tolerate or reduce memory latency.  ... 
doi:10.1109/12.752663 fatcat:r3ouuxvvoze2xbhpkcqmdhf4vi

Comparative Evaluation of Latency-Tolerating and -Reducing Techniques for Hardware-Only and Software-Only Directory Protocols

Håkan Grahn, Per Stenström
2000 Journal of Parallel and Distributed Computing  
We study in this paper how effective latency-tolerating and -reducing techniques are at cutting the memory access times for shared-memory multiprocessors with directory cache protocols managed by hardware  ...  Since software-only directory protocols handle these operations in software they will perform relatively worse unless the technique reduces the number of protocol operations.  ...  A part of this work was carried out while the authors were at the Department of Computer Engineering at Lund University.  ... 
doi:10.1006/jpdc.1999.1606 fatcat:7ggydlihm5dftkum6prqzvyhfy

OUTRIDER

Neal Clayton Crago, Sanjay Jeram Patel
2011 Proceeding of the 38th annual international symposium on Computer architecture - ISCA '11  
Moreover, instead of adding more threads as is done in modern GPUs, Outrider can tolerate memory latency with fewer threads and reduced contention for resources shared amongst threads.  ...  We present Outrider, an architecture for throughput-oriented processors that provides memory latency tolerance to improve performance on highly threaded workloads.  ...  The authors thanks the Trusted ILLIAC Center at the Information Trust Institute for their contribution of use of the computing cluster. The authors also wish to thank Steven S. Lumetta, John H.  ... 
doi:10.1145/2000064.2000079 dblp:conf/isca/CragoP11 fatcat:w56fto3w4vgoxamabvgcrkb2z4

OUTRIDER

Neal Clayton Crago, Sanjay Jeram Patel
2011 SIGARCH Computer Architecture News  
Moreover, instead of adding more threads as is done in modern GPUs, Outrider can tolerate memory latency with fewer threads and reduced contention for resources shared amongst threads.  ...  We present Outrider, an architecture for throughput-oriented processors that provides memory latency tolerance to improve performance on highly threaded workloads.  ...  The authors thanks the Trusted ILLIAC Center at the Information Trust Institute for their contribution of use of the computing cluster. The authors also wish to thank Steven S. Lumetta, John H.  ... 
doi:10.1145/2024723.2000079 fatcat:2ny5ydqgmffkvglkm2b2v6fxka

The MIT Alewife Machine

A. Agarwal, R. Bianchini, D. Chaiken, F.T. Chong, K.L. Johnson, D. Kranz, J.D. Kubiatowicz, Beng-Hong Lim, K. Mackenzie, D. Yeung
1999 Proceedings of the IEEE  
A variety of models for parallel architectures, such as shared memory, message passing, and data flow, have converged in the recent past to a hybrid architecture form called distributed shared memory (  ...  latency tolerance.  ...  Others outside of MIT contributed as well: the authors would like to thank M. Marchetti for the multigrid code, and L. Kontothanassis for the FFT code and for assistance with writing Mod MP3D.  ... 
doi:10.1109/5.747864 fatcat:6ebg346wnzcqxa22ayhdrmpdni

Effects of multithreading on cache performance

H. Kwak, B. Lee, A.R. Hurson, Suk-Han Yoon, Woo-Jong Hahn
1999 IEEE transactions on computers  
Multithreading has emerged as one of the most promising and exciting techniques used to tolerate memory latency by exploiting thread-level parallelism.  ...  The question, however, remains as to how effective multithreading is on tolerating memory latency.  ...  ACKNOWLEDGMENTS The authors would like to thank ETRI for their support and the valuable comments from reviewers.  ... 
doi:10.1109/12.752659 fatcat:6b24u3lsnrcoflhbfb453n46sa

Comparing latency-tolerance techniques for software dsm systems

R. Pinto, R. Bianchini, C.L. Amorim
2003 IEEE Transactions on Parallel and Distributed Systems  
This paper studies the isolated and combined effects of several latency-tolerance techniques for software-based distributed shared-memory systems (software DSMs).  ...  More specifically, we focus on data prefetching, update-based coherence, and single-writer optimizations for page-based software DSMs.  ...  Finally, a few other studies proposed the use of other runtime-based latency-tolerance techniques for software DSMs. We proposed hardware support for latency tolerance in [4] . Mowry et al.  ... 
doi:10.1109/tpds.2003.1247677 fatcat:dbndp7do3bb6veaqukqcori3ia

Trends in shared memory multiprocessing

P. Stenstrom, E. Hagersten, D.J. Lilja, M. Martonosi, M. Venugopal
1997 Computer  
An "application" not shown, but nonetheless an important piece of software for a shared memory machine, is the operating system. Dominant domains • Databases.  ...  The second step is to begin filling gaps in programming models and architectures for shared memory multiprocessing.  ...  Acknowledgments We thank Yale Patt, who initiated the set of task forces that allowed us to develop our thoughts in a creative environment in Hawaii.  ... 
doi:10.1109/2.642814 fatcat:mhsgglxwfvdrtc4c4ap6eshxxa

Latency, bandwidth, and concurrent issue limitations in high-performance CFD [chapter]

W.D. Gropp, D.K. Kaushik, D.E. Keyes, B.F. Smith
2001 Computational Fluid and Solid Mechanics  
Our experimental results show that this solver adapts resonably well to the high memory and network latencies.  ...  To achieve high performance, a parallel algorithm needs to effectively utilize the memory subsystem and minimize the communication volume and the number of network transactions.  ...  Execution time on the 333 MHz Pentium Pro ASCI Red machine for function evaluations only for a 2.8M-vertex case, comparing the performance of the hybrid (MPI/OpenMP) and the distributed memory (MPI alone  ... 
doi:10.1016/b978-008043944-0/50783-6 fatcat:7d5fmcbeizb6tnm2fa56w6b4zu

The MIT Alewife machine

Anant Agarwal, Ricardo Bianchini, David Chaiken, Kirk L. Johnson, David Kranz, John Kubiatowicz, Beng-Hong Lim, Kenneth Mackenzie, Donald Yeung
1995 Proceedings of the 22nd annual international symposium on Computer architecture - ISCA '95  
to provide efficient communication and synchronization; support for fine-grain computation allows many processors to cooperate on small problem sizes; and latency tolerance mechanisms -including block  ...  Four mechanisms combine to achieve these goals: software-extended coherent shared memory provides a global, linear address space; integrated message passing allows compiler and operating system designers  ...  Others outside of MIT contributed as well: we would like to thank Mike Marchetti for the Multigrid code, and Leonidas Kontothanassis for the FFT code and for assistance with writing Mod MP3D.  ... 
doi:10.1145/223982.223985 dblp:conf/isca/AgarwalBCJKKLMY95 fatcat:6bhv57cqzvdw5pvjswfrf2k6mu

A Survey on the Challenges of Implementing Physical Memory Pools

Heather Craddock, Lakshmi Prasanna Konudula, Gökhan Kul
2019 Journal of Internet Services and Information Security  
In this article, we identify enabling technologies for physical memory pools such as OS design, distributed shared memory structures and virtualization with regards to their relevance and impact on eliminating  ...  memory limits, and we discuss the challenges for physical memory pools which can be used by multiple servers.  ...  We would also like to thank Kun Cheng from Delaware State University for his valuable contributions throughout the survey process, his reviews of related papers, and his role in preparing the CLOUD 2019  ... 
doi:10.22667/jisis.2019.05.31.057 dblp:journals/jisis/CraddockKK19 fatcat:sdunwensunabrnmia4dlpubjom

Main Memory Scaling: Challenges and Solution Directions [chapter]

Onur Mutlu
2015 More than Moore Technologies for Next Generation Computer Design  
cell endurance problems, some have very high write latency/power, some have low density) that need to be overcome or tolerated.  ...  costly with conventional techniques.  ...  Part of the structure of this chapter is based on an evolving set of talks I have delivered at various venues on Scaling the Memory System in the Many-Core Era and Rethinking Memory System Design for Data-Intensive  ... 
doi:10.1007/978-1-4939-2163-8_6 fatcat:okw4kxakuja43kac65zy5c35ye

STUDY OF MEMORY ORGANIZATION AND MULTIPROCESSOR SYSTEM -USING THE CONCEPT OF DISTRIBUTED SHARED MEMORY, MEMORY CONSISTENCY MODEL AND SOFTWARE BASED DSM

Dhara Kumari
2017 International Journal of Advanced Research in Computer Science  
This paper provides an almost exhaustive survey of the existing problem and solutions in a uniform manner, presenting their memory organization, shared memory, distributed memory, distributed shared memory  ...  , Memory Consistency Model and software based DSM mechanisms and issues of importance for various DSM systems and approaches.  ...  In this paper, it includes a review of different techniques and algorithms of Software Based Distributed Shared Memory System.  ... 
doi:10.26483/ijarcs.v8i7.4406 fatcat:kjr35sorg5enxnztn3kj6kmzy4

The MIT Alewife Machine: A Large-Scale Distributed-Memory Multiprocessor [chapter]

Anant Agarwal, David Chaiken, Kirk Johnson, David Kranz, John Kubiatowicz, Kiyoshi Kurihara, Beng-Hong Lim, Gino Maa, Dan Nussbaum
1992 Scalable Shared Memory Multiprocessors  
Alewife uses a multilayered approach to achieve this goal, consisting of techniques for latency minimization and latency tolerance.  ...  The goal of the Alewife project is to discover and to evaluate techniques for automatic locality management in scalable multiprocessors.  ... 
doi:10.1007/978-1-4615-3604-8_13 fatcat:7js5232i3naevnf45t4qmrxtb4

Fault-tolerant distributed shared memory on a broadcast-based architecture

C. Katsinis, D. Hecht
2004 IEEE Transactions on Parallel and Distributed Systems  
Under the distributed shared memory (DSM) paradigm, the SOME-bus allows strong integration of the transmitter, receiver and cache controller hardware to produce a highly integrated system-wide cache coherence  ...  Backward Error Recovery fault-tolerance techniques can rely on DSM data replication and SOME-Bus broadcasts with little additional network traffic and corresponding performance degradation.  ...  Introduction In distributed shared memory (DSM) systems, an important objective of current research is the development of approaches that minimize the access time to shared data, while maintaining data  ... 
doi:10.1109/tpds.2004.83 fatcat:hvexatiapzdjtpicba4xieuv24
« Previous Showing results 1 — 15 out of 16,391 results