Filters








108,568 Hits in 9.8 sec

Improving Phase Change Memory Performance with Data Content Aware Access [article]

Shihao Song, Anup Das, Onur Mutlu, Nagarajan Kandasamy
2020 pre-print
A prominent characteristic of write operation in Phase-Change Memory (PCM) is that its latency and energy are sensitive to the data to be written as well as the content that is overwritten.  ...  Third, it re-initializes unused memory locations with known all-zeros or all-ones content in a manner that does not interfere with regular read and write accesses.  ...  Quantifying Energy Improvement Data Content Aware Access in PCM We describe DATACON in the context of DRAM-PCM hybrid memory, where embedded DRAM (eDRAM) is used as a write cache to PCM main memory.  ... 
doi:10.1145/3381898.3397210 arXiv:2005.04753v1 fatcat:6szb65djgzhsnhguipjywivnv4

Design Methodologies for Reliable and Energy-efficient PCM Systems [article]

Shihao Song, Anup Das
2020 arXiv   pre-print
Phase-change memory (PCM) is a scalable and low latency non-volatile memory (NVM) technology that has been proposed to serve as storage class memory (SCM), providing low access latency similar to DRAM  ...  In this work, we propose methodologies to tackle the bottlenecks, improving performance, reliability, energy consumption, and sustainability for a PCM system.  ...  Examples of emerging non-volatile memory technologies include phase-change memory (PCM), resistive random access memory (RRAM), ferroelectric random access memory (FeRAM), and magnetic random access memory  ... 
arXiv:2011.13959v1 fatcat:sjwdb2ejtjduxn54be6fdojll4

Demystifying the Performance of HPC Scientific Applications on NVM-based Memory Systems [article]

Ivy Peng and Kai Wu and Jie Ren and Dong Li and Maya Gokhale
2020 arXiv   pre-print
Second, we demonstrate that write-aware data placement on uncached-NVM could achieve 2x performance improvement with a 60  ...  Current NVM technologies have lower performance than DRAM and, thus, are often paired with DRAM in a heterogeneous main memory. Recently, byte-addressable NVM hardware becomes available.  ...  We identify concurrency contention on NVM by comparing the performance changes at different concurrency across memory configurations.  ... 
arXiv:2002.06499v1 fatcat:m6p6biiqincndb3bfn57lvxtzq

Providing Fairness in Heterogeneous Multicores with a Predictive, Adaptive Scheduler

Saeid Barati, Hank Hoffmann
2016 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
This paper proposes augmenting existing contention-aware approaches with predictive and adaptive components to provide fair memory access and performance improvements on heterogeneous multicores.  ...  We find that adding prediction improves fairness and performance by 38% and 4% (respectively) compared to a prior state-of-the-art contention-aware approach.  ...  Acknowledgments: We are grateful to the anonymous reviewers whose suggestions improved the paper. The effort on this project is funded by the U.S.  ... 
doi:10.1109/ipdpsw.2016.28 dblp:conf/ipps/BaratiH16 fatcat:3ho3b6p2ivgcpkreyxo44smlvm

Adaptive Thread Scheduling in Chip Multiprocessors

Ismail Akturk, Ozcan Ozturk
2019 International journal of parallel programming  
We introduce an adaptive cache-hierarchy-aware scheduler that tries to schedule threads in a way that interthread contention is minimized.  ...  A novel multi-metric scoring scheme is used which specifies L1 cache access characteristics of threads. Scheduling decisions are made based on these multi-metric scores of threads.  ...  ., all threads may be memory intensive in a particular time). Also, threads may be memory intensive; however, their memory access pattern may change drastically that affects the overall performance.  ... 
doi:10.1007/s10766-019-00637-y fatcat:wwtdfzud2rdelpxt3mcipoajxu

Improving Parallel I/O Performance with Data Layout Awareness

Yong Chen, Xian-He Sun, Rajeev Thakur, Huaiming Song, Hui Jin
2010 2010 IEEE International Conference on Cluster Computing  
I/O systems, and to improve the data access performance.  ...  The proposed layout-aware parallel I/O has a promising potential in improving the I/O performance of parallel systems.  ...  The authors are also grateful to anonymous reviewers for their constructive comments and suggestions that help the further improvement of this work.  ... 
doi:10.1109/cluster.2010.35 dblp:conf/cluster/ChenSTSJ10 fatcat:rf4g26ga6neaxnclbohms5s3qy

A Time-Aware Fault Tolerance Scheme to Improve Reliability of Multilevel Phase-Change Memory in the Presence of Significant Resistance Drift

Wei Xu, Tong Zhang
2011 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
First, based upon information-theoretical study, we show that conventional use of ECC, which is unaware of memory content lifetime, can only achieve the performance with a big gap from the information-theoretical  ...  performance advantages of such time-aware memory fault tolerance strategy in the presence of significant memory cell resistance drift.  ...  This big gap between theoretical bounds and conventional design practice motivates us to investigate the potential of incorporating data lifetime awareness to improve memory fault tolerance performance  ... 
doi:10.1109/tvlsi.2010.2052640 fatcat:arh6cjkgtra3ldfbibetz4z67y

Thread and Data Mapping in Software Transactional Memory: An Overview [article]

Douglas Pereira Pasqualin, Matthias Diener, André Rauber Du Bois, Maurício Lima Pilla
2022 arXiv   pre-print
In current microarchitectures, due to the complex memory hierarchies and different latencies on memory accesses, thread and data mapping are important issues to improve application performance.  ...  less predictable and; (2) the STM runtime has precise information about shared data and the intensity with each thread accesses them.  ...  For data mapping, it is necessary to have a global vision of memory pages accessed to be able to perform an optimized data mapping, not only the memory pages accessed by the sharing data protected by STM  ... 
arXiv:2206.01359v1 fatcat:hcfnq77favbrbkezcdr77pfkvy

Accurate Contention-aware Scheduling Method on Clustered Many-core Platform

Shingo Igarashi, Takuro Fukunaga, Takuya Azumi
2021 Journal of Information Processing  
We improved the predictability of contentions by dividing tasks into the memory access phase and the execution phase using a Directed Acyclic Graph (DAG).  ...  Therefore, we addressed contentions induced by shared memory. The ability to predict contentions that may occur during memory access helps to reduce them.  ...  This variable is true when phase X of task τ i and phase Y of task τ j , where memory access is performed, overlap in time.  ... 
doi:10.2197/ipsjjip.29.216 fatcat:mgwjcpqrjjcavndfylqwbfuy7m

Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources

Jeongseob Ahn, Changdae Kim, Jaeung Han, Young-ri Choi, Jaehyuk Huh
2012 USENIX Workshop on Hot Topics in Cloud Computing  
nonuniform memory access (NUMA) affinity, have only relied on intra-system scheduling to reduce contentions on them.  ...  This study shows that live VM migration can be used to mitigate the contentions on micro-architectural resources.  ...  Performance Improvements Our preliminary NUMA-aware scheduler improves the overall performance slightly compared with the cache-aware scheduler.  ... 
dblp:conf/hotcloud/AhnKHCH12 fatcat:7z7jzitwpbagtguievln557fhu

LACIO: A New Collective I/O Strategy for Parallel I/O Systems

Yong Chen, Xian-He Sun, Rajeev Thakur, Philip C. Roth, William D. Gropp
2011 2011 IEEE International Parallel & Distributed Processing Symposium  
We confirm that the new Layout-Aware Collective I/O (LACIO) improves the performance of current parallel I/O systems effectively with the help of noncontiguous file system calls.  ...  It holds promise in improving the I/O performance for parallel systems.  ...  ACKNOWLEDGMENT The authors are thankful to anonymous reviewers' valuable suggestions and comments that help the further improvement of this work.  ... 
doi:10.1109/ipdps.2011.79 dblp:conf/ipps/ChenSTRG11 fatcat:5x5bfjnnxvhqvnwr55pw274jfu

A Survey of Techniques for Reducing Interference in Real-time Applications on Multicore Platforms

Tamara Lugo, Santiago Lozano, Javier Fernandez, Jesus Carretero
2022 IEEE Access  
It covers techniques for reducing contentions in main memory, cache memory, a memory bus, and the integration of interference effects into schedulability analysis.  ...  BWLOCK [158] reduces memory bandwidth contention and improves the performance of soft real-time applications with a controllable performance impact on non-real-time tasks.  ...  This change in cache content will cause a cache miss burst until the cache reloads the instructions and data of the ejected task.  ... 
doi:10.1109/access.2022.3151891 fatcat:vutgetjua5byxczcivmw2esqtq

Memory-conscious collective I/O for extreme scale HPC systems

Yin Lu, Yong Chen, Yu Zhuang, Rajeev Thakur
2013 Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers - ROSS '13  
The preliminary results have demonstrated that this strategy holds promise in mitigating the memory pressure, alleviating the contention for memory bandwidth, and improving the I/O performance for projected  ...  The new collective I/O strategy restricts aggregation data traffic within disjointed subgroups, coordinates I/O accesses in intra-node and inter-node layer, and determines I/O aggregators at run time considering  ...  ACKNOWLEDGMENTS The authors acknowledge the High Performance Computing Center (HPCC) at Texas Tech University at Lubbock for providing HPC resources that have contributed to the research results reported  ... 
doi:10.1145/2491661.2481430 dblp:conf/ics/LuCZT13 fatcat:g6sgsbg3wfe33gjpwddtc3nxq4

A User-Level NUMA-Aware Scheduler for Optimizing Virtual Machine Performance [chapter]

Yuxia Cheng, Wenzhi Chen, Xiao Chen, Bin Xu, Shaoyu Zhang
2013 Lecture Notes in Computer Science  
Commodity servers deployed in the data centers are now typically using the Non-Uniform Memory Access (NUMA) architecture.  ...  Experimental results show that our NUMA-aware virtual machine scheduling algorithm is able to improve VM performance by up to 23.4% compared with the default CFS (Completely Fair Scheduler) scheduler used  ...  resources contention and to maximize system throughput with a balanced memory bandwidth usage.  ... 
doi:10.1007/978-3-642-45293-2_3 fatcat:4pg43s4yy5gb7fv5ilsn75eqfi

On mitigating memory bandwidth contention through bandwidth-aware scheduling

Di Xu, Chenggang Wu, Pen-Chung Yew
2010 Proceedings of the 19th international conference on Parallel architectures and compilation techniques - PACT '10  
However, we found that intra-quantum fine-grained bandwidth contention still happened due to a program's irregular fluctuation in memory access intensity, which is mostly ignored in previous policies.  ...  In this paper, we quantify the impact of bandwidth contention on overall performance.  ...  Numerous approaches have been proposed to reduce the requirement of a program to access memory, thus mitigating the memory bandwidth bottleneck, for example, improving data locality and eliminating useless  ... 
doi:10.1145/1854273.1854306 dblp:conf/IEEEpact/XuWY10 fatcat:3w4ns5z52bdr5fm2snepe4thu4
« Previous Showing results 1 — 15 out of 108,568 results