Filters








8,738 Hits in 11.9 sec

Global address space, non-uniform bandwidth: a memory system performance characterization of parallel systems

T. Stricker, T. Cross
Proceedings Third International Symposium on High-Performance Computer Architecture  
Many parallel systems offer a simple view of memory: all storage cells are addressed uniformly.  ...  Despite a uniform view of the memory, the machines differ significantly in their memory system performance (and may offer slightly different consistency models).  ...  Stricker's current address: Institut für Computer Systeme, ETH Zürich, Switzerland.  ... 
doi:10.1109/hpca.1997.569658 dblp:conf/hpca/StrickerG97 fatcat:cd33bx24afgcbfoo232sqppmhm

Bandwidth-Aware Page Placement in NUMA [article]

David Gureya, João Neto, Reza Karimi, João Barreto, Pramod Bhatotia, Vivien Quema, Rodrigo Rodrigues, Paolo Romano, Vladimir Vlassov
2020 arXiv   pre-print
Page placement is a critical problem for memoryintensive applications running on a shared-memory multiprocessor with a non-uniform memory access (NUMA) architecture.  ...  BWAP combines an analytical performance model of the target NUMA system with on-line iterative tuning of page distribution for a given memory-intensive application.  ...  Executive Agency of the European Commission under FPA 2012-0030.  ... 
arXiv:2003.03304v1 fatcat:jjuuahfbqzfjnho7gqchxzlgz4

Modeling parallel bandwidth

Micah Adler, Phillip B. Gibbons, Vijaya Ramachandran, Yossi Matias
1997 Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures - SPAA '97  
This paper provides the rst detailed study of the algorithmic implications of modeling parallel bandwidth as a per-processor limitation (locally-limited) v ersus an aggregate limitation (globally-limited  ...  The qsm model 25], a shared-memory model with a per-processor bandwidth parameter g, denoted in this paper as the qsm(g) model.  ...  Since the addresses of the memory locations read are sorted, at most one processor reads any memory cell during any step of the central read steps, and thus the central read steps can all be performed  ... 
doi:10.1145/258492.258502 dblp:conf/spaa/AdlerGRM97 fatcat:lmgw3ifghvbuzhwamcds2s6are

On characterizing bandwidth requirements of parallel applications

Anand Sivasubramaniam, Aman Singla, Umakishore Ramachandran, H. Venkateswaran
1995 Performance Evaluation Review  
In this paper, we quantify the link bandwidth requirement on a binary hypercube topology for a set of five parallel applications.  ...  The technique presented can be useful to a system architect to synthesize the bandwidth requirements for realizing well-balanced parallel architectures. ¤ -ary ¥ -cube networks.  ...  Experimental Setup We have chosen a CC-NUMA (Cache Coherent Non-Uniform Memory Access) shared memory multiprocessor as the architectural platform for this study.  ... 
doi:10.1145/223586.223609 fatcat:rtimlc5tlng4doq5ceorrrxyg4

On characterizing bandwidth requirements of parallel applications

Anand Sivasubramaniam, Aman Singla, Umakishore Ramachandran, H. Venkateswaran
1995 Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems - SIGMETRICS '95/PERFORMANCE '95  
In this paper, we quantify the link bandwidth requirement on a binary hypercube topology for a set of five parallel applications.  ...  The technique presented can be useful to a system architect to synthesize the bandwidth requirements for realizing well-balanced parallel architectures. ¤ -ary ¥ -cube networks.  ...  Experimental Setup We have chosen a CC-NUMA (Cache Coherent Non-Uniform Memory Access) shared memory multiprocessor as the architectural platform for this study.  ... 
doi:10.1145/223587.223609 dblp:conf/sigmetrics/SivasubramaniamSRV95 fatcat:sucmkqvpdnbf5gdnyihipdsz4a

Bandwidth-Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking [article]

Zhixiang Gu, Jose Moreira, David Edelsohn, Ariful Azad
2020 arXiv   pre-print
It is well known that SpGEMM is a memory-bound operation, and its peak performance is expected to be bound by the memory bandwidth.  ...  In this paper we characterize existing SpGEMM algorithms based on their memory access patterns and develop practical lower and upper bounds for SpGEMM performance.  ...  Hence, we primarily focus the singlesocket performance because memory bandwidth is harder to predict in Non-Uniform Memory Access (NUMA) domains.  ... 
arXiv:2002.11302v1 fatcat:3isozmxwfjdnhd5sv2jbuf6beu

Effects of communication latency, overhead, and bandwidth in a cluster architecture

Richard P. Martin, Amin M. Vahdat, David E. Culler, Thomas E. Anderson
1997 Proceedings of the 24th annual international symposium on Computer architecture - ISCA '97  
This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations.  ...  We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications.  ...  All but two of the applications are written in an SPMD model using Split-C [13] , a parallel extension of the C programming language that provides a global address space on distributed memory machines  ... 
doi:10.1145/264107.264146 dblp:conf/isca/MartinVCA97 fatcat:ykuhh4gpevaj5eelkd6phyxyom

Effects of communication latency, overhead, and bandwidth in a cluster architecture

Richard P. Martin, Amin M. Vahdat, David E. Culler, Thomas E. Anderson
1997 SIGARCH Computer Architecture News  
This four-parameter characterization of communication performance is based on the LogP model [2, 14] , the framework for our systematic investigation of the communication design space.  ...  This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations.  ...  All but two of the applications are written in an SPMD model using Split-C [13] , a parallel extension of the C programming language that provides a global address space on distributed memory machines  ... 
doi:10.1145/384286.264146 fatcat:k4dkqrtb7zf2vg22mvf3u2y6ky

Improving effective bandwidth through compiler enhancement of global cache reuse

Chen Ding, Ken Kennedy
2004 Journal of Parallel and Distributed Computing  
While CPU speed has been improved by a factor of 6400 over the past twenty years, memory bandwidth has increased by a factor of only 139 during the same period.  ...  Both waste memory bandwidth. This dissertation pursues a software remedy.  ...  A single memory module can become the point of contention and the bottleneck of the whole parallel system.  ... 
doi:10.1016/j.jpdc.2003.09.005 fatcat:lt762atuijgefjrr4mqpm6q3wm

Bandwidth-efficient wireless multimedia communications

L. Hanzo
1998 Proceedings of the IEEE  
order to meet backward compatibility requirements with existing systems and to achieve best compromise among a range of conflicting system requirements in terms of communications quality, bandwidth requirements  ...  channel coding for wireless communications are addressed.  ...  the last chapter in the context of the global system of mobile communications known as GSM.  ... 
doi:10.1109/5.681368 fatcat:5ycvcgjfk5bjbfblcqckuva37e

Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Milo M. K. Martin, Pacia J. Harper, Daniel J. Sorin, Mark D. Hill, David A. Wood
2003 SIGARCH Computer Architecture News  
For example, one of our predictors obtains almost 90% of the performance of snooping while using only 15% more bandwidth than a directory protocol (and less than half the bandwidth of snooping).  ...  Destination-set prediction can improve the latency/bandwidth tradeoff in shared-memory multiprocessors.  ...  Acknowledgments We thank Virtutech AB, the Wisconsin Condor group, and the Wisconsin Computer Systems Lab for their help and support.  ... 
doi:10.1145/871656.859642 fatcat:ps4egvgxh5ca5abiwfuodklgxe

NIC-based rate control for proportional bandwidth allocation in Myrinet clusters

A. Gulati, D.K. Panda, P. Sadayappan, P. Wyckoff
2001 International Conference on Parallel Processing, 2001.  
However, clusters are now being increasingly used in environments characterized by non-cooperating communication flows with a range of service requirements.  ...  Also, contention between flows at the endnodes has not been addressed earlier. In this paper, we explore the use of "rate control" as a means for proportional bandwidth allocation in clusters.  ...  As a result of these two trends, network traffic is clusters is now characterized by the coexistence of non-cooperating communication flows with a variety of service requirements.  ... 
doi:10.1109/icpp.2001.952075 dblp:conf/icpp/GulatiPSW01 fatcat:fmrjzdm6x5d2rklc6wlrcgtuom

Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors

Milo M. K. Martin, Pacia J. Harper, Daniel J. Sorin, Mark D. Hill, David A. Wood
2003 Proceedings of the 30th annual international symposium on Computer architecture - ISCA '03  
For example, one of our predictors obtains almost 90% of the performance of snooping while using only 15% more bandwidth than a directory protocol (and less than half the bandwidth of snooping).  ...  Destination-set prediction can improve the latency/bandwidth tradeoff in shared-memory multiprocessors.  ...  Acknowledgments We thank Virtutech AB, the Wisconsin Condor group, and the Wisconsin Computer Systems Lab for their help and support.  ... 
doi:10.1145/859618.859642 fatcat:4dkl77tagfb4vl6ohxq3fsarky

Protograph-based Bit-Interleaved Coded Modulation: A Promising Bandwidth-Efficient Design Paradigm [article]

Yi Fang, Pingping Chen, Yong Liang Guan, Francis C. M. Lau, Yonghui Li, Guanrong Chen
2021 arXiv   pre-print
a large number of communication and storage systems.  ...  FEC solution for BICM systems, and found widespread applications such as deep-space communication, satellite communication, wireless communication, optical communication, and flash-memory-based data storage  ...  ] , to characterize the asymptotic performance and to facilitate optimization of the system.  ... 
arXiv:2112.08557v1 fatcat:szkgba5muvedvbtbrcnfndcrom

Optical memory bandwidth and multiplexing capacity in the erbium telecommunication window

J Dajczgewand, R Ahlefeldt, T Böttger, A Louchet-Chauvet, J-L Le Gouët, T Chanelière
2015 New Journal of Physics  
We study the bandwidth and multiplexing capacity of an erbium-doped optical memory for quantum storage purposes.  ...  We concentrate on the protocol ROSE (Revival of a Silenced Echo) because it has the largest potential multiplexing capacity.  ...  The research leading to these results has received funding from the People Programme (Marie Curie Actions) of the European Union's Seventh Framework Programme FP7/2007-2013/ under REA grant agreement no  ... 
doi:10.1088/1367-2630/17/2/023031 fatcat:uu4766kc4ff7nhtrwbxhzkpryu
« Previous Showing results 1 — 15 out of 8,738 results