Filters








10,457 Hits in 6.0 sec

Reducing the communication overhead of dynamic applications on shared memory multiprocessors

A. Sivasubramaniam
Proceedings Third International Symposium on High-Performance Computer Architecture  
The goal of this research is to reduce the read and write latencies of applications with dynamic communication behavior by employing intelligent sender-initiated data transfer mechanisms.  ...  Towards this goal, we present a set of write primitives that lower the communication overhead for shared memory accesses governed by locks.  ...  Acknowledgments The author would like to thank the members of the architecture group at Georgia Tech for the many helpful discussions leading to the ideas in this paper.  ... 
doi:10.1109/hpca.1997.569660 dblp:conf/hpca/Sivasubramaniam97 fatcat:g7jhtrw3fbaedhpboupv4df44q

Exploration of distributed shared memory architectures for NoC-based multiprocessors

Matteo Monchiero, Gianluca Palermo, Cristina Silvano, Oreste Villa
2007 Journal of systems architecture  
The data allocation on the physically distributed shared memory space is dynamically managed by an on-chip Hardware Memory Management Unit.  ...  Experimental results show the impact of different NoC topologies and distributed shared memory configurations for a selected set of parallel benchmark applications from the power/performance perspective  ...  These approaches are based on static or, for more complex approaches, dynamic profiling of the application, but they are not so effective when the memory utilization changes dynamically with the system  ... 
doi:10.1016/j.sysarc.2007.01.008 fatcat:6jjvd42x2vetdmai3ftipxlg5e

Exploration of Distributed Shared Memory Architectures for NoC-based Multiprocessors

Matteo Monchiero, Gianluca Palermo, Cristina Silvano, Oreste Villa
2006 2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation  
The data allocation on the physically distributed shared memory space is dynamically managed by an on-chip Hardware Memory Management Unit.  ...  Experimental results show the impact of different NoC topologies and distributed shared memory configurations for a selected set of parallel benchmark applications from the power/performance perspective  ...  These approaches are based on static or, for more complex approaches, dynamic profiling of the application, but they are not so effective when the memory utilization changes dynamically with the system  ... 
doi:10.1109/icsamos.2006.300821 dblp:conf/samos/MonchieroPSV06 fatcat:cu6537637na4vgk3bfthdjxuoe

Disco

Edouard Bugnion, Scott Devine, Mendel Rosenblum
1997 ACM SIGOPS Operating Systems Review  
In this paper we examine the problem of extending modem operating systems to run efficiently on large-scale shared memory multiprocessors without a large implementation effort.  ...  To reduce the memory overheads associated with running multiple operating systems, we have developed techniques where the virtual machines transparently share major data structures such as the program  ...  Our colleagues Kinshuk Govil, Dan Teodosiu, and Ben Verghese participated in many lively discussions on Disco and carefully read drafts of the paper.  ... 
doi:10.1145/269005.266672 fatcat:uvcwdv63yjgqbaapjkfxbzs474

Classifying and alleviating the communication overheads in matrix computations on large-scale NUMA multiprocessors

Yi-Min Wang, Hsiao-Hsi Wang, Ruei-Chuan Chang
1998 Journal of Systems and Software  
The high communication cost results in two main overheads, remote memory access and memory contention, and they reduce the performance of parallel applications on largescale shared-memory NUMA multiprocessors  ...  Large-scale, shared-memory multiprocessors have non-uniform memory access (NUMA) costs. The high communication cost dominates the source of matrix computations' execution.  ...  Communication cost results in two main overheads, remote memory access and memory contention, and they reduce the performance of matrix computations on NUMA multiprocessors.  ... 
doi:10.1016/s0164-1212(98)10040-7 fatcat:uhsirwqzsnfv5jthfqa4vjtmju

Using processor affinity in loop scheduling on shared-memory multiprocessors

E.P. Markatos, T.J. LeBlanc
1994 IEEE Transactions on Parallel and Distributed Systems  
In this paper we consider a third dimension to the problem of loop scheduling on shared-memory multiprocessors: communication overhead caused by accesses to non-local data.  ...  We conclude that loop scheduling algorithms for shared-memory multiprocessors cannot a ord to ignore the location of data, particularly in light of the increasing disparity b e t ween processor and memory  ...  Our experiments on the Iris con rm that communication o verhead is a dominant factor in application performance on modern shared-memory multiprocessors.  ... 
doi:10.1109/71.273046 fatcat:c6qwvhnjezh7fihxnhckix245y

The Jrpm system for dynamically parallelizing sequential Java programs

M.K. Chen, K. Olukotun
2003 IEEE Micro  
Unfortunately, current multiprocessor architectures must communicate dependencies throughout the multiple layers of memory hierarchy.  ...  This dynamic parallelization system overcomes the limitations of conventional parallelizing compiler and multiprocessor technology.  ...  Acknowledgments We thank Lance Hammond for his valuable feedback and simulator support, and Tim Wilkinson for his support of the Kaffe virtual machine.  ... 
doi:10.1109/mm.2003.1261384 fatcat:337da7zgxbdtrgn2aqu72hxqai

A New Token-Based Channel Access Protocol for Wavelength Division Multiplexed Multiprocessor Interconnects

Joon-Ho Ha, Timothy Mark Pinkston
2000 Journal of Parallel and Distributed Computing  
Simulation results indicate that the proposed scheme can significantly increase the performance of protocols based on preallocation and those based on preallocation-controlled reservation of multiple channels  ...  The proposed token-based time division multiple access protocol minimizes latency by allowing dynamic allocation of slots to use channels efficiently.  ...  Given typical communication behavior of shared memory multiprocessors, the use of a token in conjunction with TDMA has tremendous potential to reduce packet latency by minimizing bandwidth wastage due  ... 
doi:10.1006/jpdc.1999.1599 fatcat:lofnf7nlmrbtnernbfbf6exsry

Modelling and Evaluation of Multiprocessor Architecture

Preeti Rajput, Varsha Kumari
2012 International Journal of Computer Applications  
This scheme has been implemented on Linearly Extensible Triangle (LEΔ) and Linearly Extensible Tree (LET) which reduces the Load Imbalance Factor (LIF) and also the execution time of parallel tasks assigned  ...  Many load balancing polices achieve high system performance by increasing the utilization of CPU, memory, or a combination of CPU and memory [3] .  ...  There are two types of multiprocessor models: shared-memory and message passing system. The shared memory model has single address space and provides a global memory shared by all processors.  ... 
doi:10.5120/8323-1371 fatcat:jlic2gtlefe75mfaub7me7sfqm

Characterizing Performance and Power towards Efficient Synchronization of GPU Kernels

Islam Harb, Wu-Chun Feng
2016 2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)  
There is a lack of support for explicit synchronization in GPUs between the streaming multiprocessors (SMs) adversely impacts the performance of the GPUs to efficiently perform inter-block communication  ...  Although this topic has been addressed in previous research studies, there has been neither a solid quantification of such overhead, nor guidance on when to use each of the different approaches.  ...  Intra-block synchronization coordinates the threads within a streaming multiprocessor (SM) in the context of shared on-chip memory.  ... 
doi:10.1109/mascots.2016.58 dblp:conf/mascots/HarbF16 fatcat:s5q4kd2arvcpziwovbr67k4rvi

Disco: running commodity operating systems on scalable multiprocessors

Edouard Bugnion, Scott Devine, Kinshuk Govil, Mendel Rosenblum
1997 ACM Transactions on Computer Systems  
Our colleagues Kinshuk Govil, Dan Teodosiu, and Ben Verghese participated in many lively discussions on Disco and carefully read drafts of the paper.  ...  This study is part of the Stanford FLASH project, funded by ARPA grant DABT63-94-C-0054. Ed Bugnion is supported in part by an NSF Graduate Research Fellowship.  ...  Disco communicates through shared-memory in most cases.  ... 
doi:10.1145/265924.265930 fatcat:t5pdtgvenrforgy5mtnzbwqth4

Disco

Edouard Bugnion, Scott Devine, Mendel Rosenblum
1997 Proceedings of the sixteenth ACM symposium on Operating systems principles - SOSP '97  
Our colleagues Kinshuk Govil, Dan Teodosiu, and Ben Verghese participated in many lively discussions on Disco and carefully read drafts of the paper.  ...  This study is part of the Stanford FLASH project, funded by ARPA grant DABT63-94-C-0054. Ed Bugnion is supported in part by an NSF Graduate Research Fellowship.  ...  The virtual subnet and networking interfaces of Disco also use copy-on-write mappings to reduce copying and to allow for memory sharing.  ... 
doi:10.1145/268998.266672 dblp:conf/sosp/BugnionDR97 fatcat:g3pofnftqrabri5qef5zzzwsfm

The scalability of multigrain systems

Donald Yeung
1999 Proceedings of the 13th international conference on Supercomputing - ICS '99  
On five shared memory applications, the performance model is accurate to within 18% of measured runtime for four applications, and within 22% for all five.  ...  Researchers have recently proposed coupling small-to mediumscale multiprocessors to build large-scale shared memory machines, known as multigrain shared memory systems.  ...  The technique can be applied on any software DSM system that supports a release consistent memory model, and that makes use of delayed coherence to reduce internode communication.  ... 
doi:10.1145/305138.305203 dblp:conf/ics/Yeung99 fatcat:eaao3nq4cfbsdmxy5gnrhyqr5u

Power Aware Reconfigurable Multiprocessor for Elliptic Curve Cryptography

Madhura Purnaprajna, Christoph Puttmann, Mario Porrmann
2008 2008 Design, Automation and Test in Europe  
A finite field multiplication in GF (2 233 ) was chosen as a sample application to evaluate the performance on the QuadroCore reconfigurable multiprocessor architecture.  ...  Further, via reconfiguration to suit the application, power savings of about 24% were noted in UMC's 90nm standard cell technology.  ...  Acknowledgement The research described in this paper was funded in part by the Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung -BMBF), registered there under grant  ... 
doi:10.1109/date.2008.4484880 dblp:conf/date/PurnaprajnaPP08 fatcat:jlnsewxx75c3fhil5duqqdohwe

Power aware reconfigurable multiprocessor for elliptic curve cryptography

Madhura Purnaprajna, Christoph Puttmann, Mario Porrmann
2008 Proceedings of the conference on Design, automation and test in Europe - DATE '08  
A finite field multiplication in GF (2 233 ) was chosen as a sample application to evaluate the performance on the QuadroCore reconfigurable multiprocessor architecture.  ...  Further, via reconfiguration to suit the application, power savings of about 24% were noted in UMC's 90nm standard cell technology.  ...  Acknowledgement The research described in this paper was funded in part by the Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung -BMBF), registered there under grant  ... 
doi:10.1145/1403375.1403727 fatcat:qr6i5acebfaj5nz2rhbfovlfmy
« Previous Showing results 1 — 15 out of 10,457 results