Filters








1,191 Hits in 2.7 sec

The robustness of NUMA memory management

Richard P. LaRowe, Carla Schlatter Ellis, Laurence S. Kaplan
1991 Proceedings of the thirteenth ACM symposium on Operating systems principles - SOSP '91  
Acknowledgements The authors wish to thank the SOSP program committee and outside reviewers for their helpful suggestions for improving this paper.  ...  In any case, the results of this subsection demonstrate the primary conclusion of this paper. NUMA memory management is robust.  ...  have suggested that NUMA memory management policy depends on the memory reference patterns of applications and on the target architectures.  ... 
doi:10.1145/121132.121158 dblp:conf/sosp/LaRoweEK91 fatcat:73gnafehjjdnfklf56s3n5l5qe

Improving the scalabiliy of neutron cross-section lookup codes on multicore NUMA system [article]

Kazutomo Yoshii, John Tramm, Andrew Siegel, Pete Beckman
2019 arXiv   pre-print
memory access (NUMA) systems.  ...  We explain how the physical memory allocation inside the kernel affects the multicore scalability of XSBench.  ...  Department of Energy Office of Science laboratory, is operated under Contract No. DE-AC02-06CH11357. The U.S.  ... 
arXiv:1909.03632v1 fatcat:tlg5i6pxg5f3lfgvwwhearbwqu

Challenges of memory management on modern NUMA systems

Fabien Gaud, Baptiste Lepers, Justin Funston, Mohammad Dashti, Alexandra Fedorova, Vivien Quéma, Renaud Lachaize, Mark Roth
2015 Communications of the ACM  
Acknowledgments We thank Oracle Labs and the British Columbia Innovation Council for funding this work.  ...  NUMA Memory Placement Strategies The results in Figure 2 and the table motivate a NUMA memory-management algorithm that places importance on congestion management, rather than focusing solely on reduc-ing  ...  Linux exposes manual NUMA memory-management functions to programmers through the libnuma library and associated system calls.  ... 
doi:10.1145/2814328 fatcat:2toixgaz3zg3zb47dwmmy3rufi

NUMA obliviousness through memory mapping

Mrunal Gawade, Martin Kersten
2015 Proceedings of the 11th International Workshop on Data Management on New Hardware - DaMoN'15  
Hence, setting explicit process and memory affinity results into a robust execution in NUMA oblivious plans.  ...  In this paper we explore the role of memory mapped storage to provide transparent data access in a NUMA environment, without the need of explicit data partitioning.  ...  As it uses a dedicated buffer manager, rather than memory mapped storage, a comparison with Vectorwise (See Figure 9 ) provides a perspective of the possible role of NUMA in its execution performance.  ... 
doi:10.1145/2771937.2771948 dblp:conf/damon/GawadeK15 fatcat:zkx62rikc5f7pfaz6fnsygvnii

Simple but effective techniques for NUMA memory management

W. Bolosky, R. Fitzgerald, M. Scott
1989 Proceedings of the twelfth ACM symposium on Operating systems principles - SOSP '89  
Multiprocessors with non-uniform memory access times introduce the problem of placing data near the processes that use them, in order to improve performance.  ...  It also suggests that the greatest leverage for further performance improvement lies in reducing false sharing, which occurs when the same page contains objects that would best be placed in different memories  ...  , to Bob Marinelli for early help in bring up the ACE, to the entire Mach crew at CMU for stimulating and lively discussion regarding the relationship of Mach, the pmap interface and NUMA machines, and  ... 
doi:10.1145/74850.74854 dblp:conf/sosp/BoloskyFS89 fatcat:ditm7jm6djgvzlsyulpcmq6yry

Simple but effective techniques for NUMA memory management

W. Bolosky, R. Fitzgerald, M. Scott
1989 ACM SIGOPS Operating Systems Review  
Multiprocessors with non-uniform memory access times introduce the problem of placing data near the processes that use them, in order to improve performance.  ...  It also suggests that the greatest leverage for further performance improvement lies in reducing false sharing, which occurs when the same page contains objects that would best be placed in different memories  ...  , to Bob Marinelli for early help in bring up the ACE, to the entire Mach crew at CMU for stimulating and lively discussion regarding the relationship of Mach, the pmap interface and NUMA machines, and  ... 
doi:10.1145/74851.74854 fatcat:bhqjesruzvb37hbzvdzamx7ara

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units [article]

Bongjoon Hyun, Youngeun Kwon, Yujeong Choi, John Kim, Minsoo Rhu
2019 arXiv   pre-print
To satisfy the compute and memory demands of deep neural networks, neural processing units (NPUs) are widely being utilized for accelerating deep learning algorithms.  ...  Through a careful data-driven application characterization study, we root-cause several limitations of prior GPU-centric address translation schemes and propose a memory management unit (MMU) that is tailored  ...  To overcome the memory capacity bottleneck, 0 0.2 0.4 0.6 0.8 1 1.2 Baseline NUMA(slow) NUMA(fast) Baseline NUMA(slow) NUMA(fast) Baseline NUMA(slow) NUMA(fast) Baseline NUMA(slow  ... 
arXiv:1911.06859v1 fatcat:pyzkc6lh55gslf3kzzgseddt5q

SALSA

Elad Gidron, Idit Keidar, Dmitri Perelman, Yonathan Perez
2012 Proceedinbgs of the 24th ACM symposium on Parallelism in algorithms and architectures - SPAA '12  
SALSA manages large chunks of tasks, which improves locality and facilitates stealing.  ...  SALSA uses a novel approach for coordination among consumers, without strong atomic operations or memory barriers in the fast path. It invokes only two CAS operations during a chunk steal.  ...  We are especially interested in a management policy suitable for NUMA architectures (see Figure 1) , where each CPU has its own memory, and memories of other CPUs are accessed over an interconnect.  ... 
doi:10.1145/2312005.2312035 dblp:conf/spaa/GidronKPP12 fatcat:egphobpsdvbl5b3xm7ltduzqyq

Scalable NUMA-Aware Wilson-Dirac on Supercomputers

Claude Tadonki
2017 2017 International Conference on High Performance Computing & Simulation (HPCS)  
In order to lower the latter, explicit shared memory implementations should be considered at the level of a compute node, since this will lead to a less complex data communication graph and thus (at least  ...  We focus on this aspect and propose a novel efficient NUMA-aware scheduling, together with a combination of the major HPC strategies for large-scale LQCD.  ...  Thanks to Christine Einsenbeis from INRIA for our regular discussions about LQCD implementations, and to my PhD student Adilla Susungi for the same about NUMA considerations.  ... 
doi:10.1109/hpcs.2017.56 dblp:conf/ieeehpcs/Tadonki17 fatcat:5t57tywdsnhunpxwytullhkeia

Harris corner detection on a NUMA manycore

Olfa Haggui, Claude Tadonki, Lionel Lacassagne, Fatma Sayadi, Bouraoui Ouni
2018 Future generations computer systems  
The corresponding data access patterns follow a stencil model, which is known to require careful memory organization and management.  ...  In this paper, we study a direct and explicit implementation of common and novel optimization strategies, and provide a NUMA-aware parallelization.  ...  [25] analyze the core of the computation and an efficient management of the data, compiler-based vectorization and shared memory parallelism are also studied.  ... 
doi:10.1016/j.future.2018.01.048 fatcat:u6llpftkwveqjgf27ms6xqhtxi

A Scalable Physical Memory Allocation Scheme for L4 Microkernel

Chen Tian, Daniel Waddington, Jilong Kuang
2012 2012 IEEE 36th Annual Computer Software and Applications Conference  
In this work, we first study the scalability issue of the PMA implementation in L4 microkernels, and propose our solution in the context of Fiasco.OC, a state-of-the-art L4 microkernel implementation.  ...  We also discuss how to leverage the L4 microkernel design advantages to implement a PMA with more advanced features, such as load balancing, customizability and NUMA-awareness.  ...  The amount of memory managed by kernel PMA however is only 8% of the total by default. The rest of the memory is managed by a user level PMA and used by applications. This work focuses on this PMA.  ... 
doi:10.1109/compsac.2012.85 dblp:conf/compsac/TianWK12 fatcat:kjtzwy2pdfhwhlkm33skmrn6ou

Ada and cc-NUMA architectures what can be achieved with Ada 2005?

A. J. Wellings, A. H. Malik, N. C. Audsley, A. Burns
2010 ACM SIGAda Ada Letters  
We focus on the issue of memory management and memory accesses on a cc-NUMA architecture. A cc-NUMA architecture is chosen, as we believe it to be more scalable than SMP systems.  ...  Real-time systems are finding it difficult to make the shift from single processor systems to multiprocessors because of the lack of support from programming platforms for multiprocessors.  ...  The goal is to enabling robust operating system (OS)-directed motherboard device configuration and power management of both devices and entire systems.  ... 
doi:10.1145/1806546.1806560 fatcat:7yx7igcgwbcinfc7uke36qtip4

Flexible use of memory for replication/migration in cache-coherent DSM multiprocessors

Vijayaraghavan Soundararajan, Mark Heinrich, Ben Verghese, Kourosh Gharachorloo, Anoop Gupta, John Hennessy
1998 SIGARCH Computer Architecture News  
Given the limitations of bus-based multiprocessors, CC-NUMA is the scalable architecture of choice for shared-memory machines.  ...  in part of the local memory.  ...  of this paper.  ... 
doi:10.1145/279361.279403 fatcat:doy7vvwsrvekjfrrgroip2c6gm

Multi-core, main-memory joins

Cagri Balkesen, Gustavo Alonso, Jens Teubner, M. Tamer Özsu
2013 Proceedings of the VLDB Endowment  
In this paper we experimentally study the performance of main-memory, parallel, multi-core join algorithms, focusing on sort-merge and (radix-)hash join.  ...  This claim is justified based on the width of SIMD instructions (sort-merge outperforms radix-hash join once SIMD is sufficiently wide), and NUMA awareness (sort-merge is superior to hash join in NUMA  ...  Acknowledgements This work was supported by the Swiss National Science Foundation (Ambizione grant; project Avalanche), by the Enterprise Computing Center (ECC) of ETH Zurich, and by Deutsche Forschungsgemeinschaft  ... 
doi:10.14778/2732219.2732227 fatcat:v6q7kdmarbfkpcyaj47xcp2ltq

Preface

Richard E. Harper
2001 IBM Journal of Research and Development  
The total worldwide server customer revenue for the year 2000 for 32-bit Intel Architecture-based servers was $26 billion, with an annual growth rate of 20%.  ...  Yet even higher levels of capability can and should be delivered to our customers, especially in the areas of performance, scalability, cost, and RAS. For example,  ...  The requirements and opportunities of such management are discussed in the paper.  ... 
doi:10.1147/rd.452.0187 fatcat:hvlpudge75b3xeqzvbbbxl3mem
« Previous Showing results 1 — 15 out of 1,191 results