Filters








22,276 Hits in 6.0 sec

On-Chip Mechanisms to Reduce Effective Memory Access Latency [article]

Milad Hashemi
2016 arXiv   pre-print
This dissertation develops hardware that automatically reduces the effective latency of accessing memory in both single-core and multi-core systems.  ...  With these mechanisms, this dissertation demonstrates a 62% increase in performance and a 19% decrease in effective memory access latency for a quad-core processor on a set of high memory intensity workloads  ...  This dissertation focuses on evaluating on-chip mechanisms to reduce memory access latency. Other hardware prefetching mechanisms specifically target the pointers that lead to cache misses.  ... 
arXiv:1609.00306v1 fatcat:hh2lxatnhfdz5mekvmim2p5a24

On-chip mechanisms to reduce effective memory access latency [article]

Milad Olia Hashemi
2016
This dissertation develops hardware that automatically reduces the effective latency of accessing memory in both single-core and multi-core systems.  ...  With these mechanisms, this dissertation demonstrates a 62% increase in performance and a 19% decrease in effective memory access latency for a quad-core processor on a set of high memory intensity workloads  ...  This dissertation focuses on evaluating on-chip mechanisms to reduce memory access latency. Other hardware prefetching mechanisms specifically target the pointers that lead to cache misses.  ... 
doi:10.15781/t2n58cp64 fatcat:6u7uwomkqvgq5aftrhe7am3akm

Understanding Reduced-Voltage Operation in Modern DRAM Devices

Kevin K. Chang, Onur Mutlu, A. Giray Yağlıkçı, Saugata Ghose, Aditya Agrawal, Niladrish Chatterjee, Abhijith Kashyap, Donghyuk Lee, Mike O'Connor, Hasan Hassan
2017 Proceedings of the ACM on Measurement and Analysis of Computing Systems  
Aggressive supply voltage reduction requires a thorough understanding of the effect voltage scaling has on DRAM access latency and DRAM reliability.  ...  Based on our observations, we propose a new DRAM energy reduction mechanism, called Voltron.  ...  Furthermore, low-power modes have a smaller effect on memory-intensive workloads, which exhibit little idleness in memory accesses, whereas, as we showed in Section 6.3, our mechanism is especially effective  ... 
doi:10.1145/3084447 dblp:journals/pomacs/ChangYGACKLOHM17 fatcat:oaonzbkwpjhj5jabnmklgdgmn4

Understanding and Improving the Latency of DRAM-Based Memory Systems [article]

Kevin K. Chang
2017 arXiv   pre-print
We also examine the critical relationship between supply voltage and latency in modern DRAM chips and develop new mechanisms that exploit this voltage-latency trade-off to improve energy efficiency.  ...  other new mechanisms to improve the performance, energy efficiency, or reliability of future memory systems.  ...  Thus, LISA is an effective substrate that can enable mechanisms to fundamentally reduce memory latency.  ... 
arXiv:1712.08304v1 fatcat:6y2nr2eowvb5fhr7km7azmkioe

Improving DRAM Performance, Security, and Reliability by Understanding and Exploiting DRAM Timing Parameter Margins [article]

Jeremie S. Kim
2021 arXiv   pre-print
devices and it is critical to research more effective solutions to RowHammer.  ...  Finally, we characterize the RowHammer security vulnerability on a wide range of modern DRAM chips while violating the DRAM refresh requirement in order to directly characterize the underlying DRAM technology  ...  mechanisms that rely on a static profile of weak cells to reduce DRAM access latency, and 3) devise new mechanisms that exploit more activation failure characteristics on state-of-the-art LPDDR4 DRAM  ... 
arXiv:2109.14520v1 fatcat:7hhrlz3tfjgx5fekdblfawxf3a

Software-controlled on-chip memory for high-performance and low-power computing

Masaaki Kondo, Motonobu Fujita, Hiroshi Nakamura
2002 SIGARCH Computer Architecture News  
: 32B or 128B • throughput of cache/On-Chip Memory = 8B/cycle • throughput of Off-Chip Memory: 1B/cycle • Off-Chip Memory access latency: 160 cycle • page size: 4KB • Cache: prefetching cache (Cache model  ...  Effects on Execution Time On-Chip Memory Features T b T l Tt software controllability - - ↓ burst data transfer ↑ ↓ - stride data transfer ↑ ↓ ↓ Latency Tolerating Techniques of Cache T b T l Tt larger  ... 
doi:10.1145/571666.571670 fatcat:lxjkisshubek3mf2hmv3re7ex4

Reducing DRAM Access Latency by Exploiting DRAM Leakage Characteristics and Common Access Patterns [article]

Hasan Hassan
2016 arXiv   pre-print
In this thesis, we develop a low-cost mechanism, called ChargeCache, which enables faster access to recently-accessed rows in DRAM, with no modifications to DRAM chips.  ...  If a later DRAM request hits in that table, the memory controller uses lower timing parameters, leading to reduced DRAM latency.  ...  Our goal in this work is to design a mechanism to reduce the average DRAM access latency without modifying the existing DRAM chips.  ... 
arXiv:1609.07234v1 fatcat:5iuox7vjmndu3dciwbvlzpc5hu

Adaptive-Latency DRAM (AL-DRAM) [article]

Donghyuk Lee, Yoongu Kim, Gennady Pekhimenko, Samira Khan, Vivek Seshadri, Kevin Chang, Onur Mutlu
2016 arXiv   pre-print
One can therefore reduce latency by adapting the timing parameters to the current operating temperature and the current DIMM that is being accessed.  ...  The key goal of AL-DRAM is to exploit the extra margin that is built into the DRAM timing parameters to reduce DRAM latency.  ...  Placing data based on this information and the latency criticality of data maximizes the benefits of lowering DRAM latency. Error-correction mechanisms to further reduce DRAM latency.  ... 
arXiv:1603.08454v1 fatcat:7y7tvomjhveahaii5ouodskmwu

Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity [article]

Donghyuk Lee
2016 arXiv   pre-print
Consequently, processors spend a long time waiting to access data from main memory, making the long main memory access latency one of the most critical bottlenecks to achieving high system performance.  ...  into two shorter segments using an isolation transistor, allowing one segment to be accessed with reduced latency.  ...  This leads to effectively higher amount of charge for the data, thereby reducing access latency to the rows.  ... 
arXiv:1604.08041v1 fatcat:zw4nctympra4fp4cwzkiwbb2ca

THREE LEVELS EFFECTIVE MEMORY ACCESS OPTIMIZATION ADDRESSING HIGH LATENCY ISSUES IN MODERN MEMORY DEPENDENT SYSTEMS

Muhammad Yousaf Ali Khan
2020 JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES  
This concludes that the approach has significant effect on memory access and dynamic buffer use. 617 integrating with the advance dynamic buffers and effective memory partitioning so as to achieve the  ...  Number of different approaches in the recent past has adopted to optimize the high latency in memory access application.  ...  Now the problem of accessing these on chip memory and give rise to high latency application.  ... 
doi:10.26782/jmcms.2020.08.00051 fatcat:k3uedtieeffexhdovi66sokg4m

CARAT: Context-aware runtime adaptive task migration for multi core architectures

J Jahn, M A A Faruque, J Henkel
2011 2011 Design, Automation & Test in Europe  
This novel mechanism is built on an in-depth analysis of the memory access behavior of several multi-media and robotic embedded-systems applications. †  ...  This work presents a novel context-aware runtime adaptive task migration mechanism (CARAT) that reduces the task migration latency by 93.12%, 97.03% and 100% compared to three state-of-the-art mechanisms  ...  integrates 13 MiB of on-chip memory.  ... 
doi:10.1109/date.2011.5763093 dblp:conf/date/JahnFH11 fatcat:ju2kv4ilpnbbvmczcs6w7dffvi

Practical off-chip meta-data for temporal memory streaming

Thomas F. Wenisch, Michael Ferdman, Anastasia Ailamaki, Babak Falsafi, Andreas Moshovos
2009 2009 IEEE 15th International Symposium on High Performance Computer Architecture  
By using hash-based lookup to prefetch sequences of tens of misses, STMS mitigates main-memory meta-data access latency.  ...  For maximum effectiveness, STMS needs 64MB of meta-data in main memory, a small fraction of memory in servers. • Latency efficiency.  ...  Acknowledgements The authors would like to thank Brian Gold and the anonymous reviewers for their feedback.  ... 
doi:10.1109/hpca.2009.4798239 dblp:conf/hpca/WenischFAFM09 fatcat:qzies3ngwjaetpsnel7mbbclkq

Fine-grained DVFS using on-chip regulators

Stijn Eyerman, Lieven Eeckhout
2011 ACM Transactions on Architecture and Code Optimization (TACO)  
Inspired by these insights, we subsequently propose a fine-grained microarchitecture-driven DVFS mechanism that scales down voltage and frequency upon individual off-chip memory accesses using on-chip  ...  We also demonstrate that the proposed fine-grained DVFS mechanism is orthogonal to existing coarse-grained DVFS policies, and further reduces energy by 6% on average and up to 11% for memory-intensive  ...  We find that fine-grained DVFS is effective at a timescale of on the order of tens or hundreds of processor cycles, or on the order of the memory access latency.  ... 
doi:10.1145/1952998.1952999 fatcat:nwbejbpr7zhh5gfn5ys4n6bzhu

A NUCA substrate for flexible CMP cache sharing

Jaehyuk Huh, Changkyu Kim, Hazim Shafi, Lixin Zhang, Doug Burger, Stephen W. Keckler
2005 Proceedings of the 19th annual international conference on Supercomputing - ICS '05  
We propose an organization for the on-chip memory system of a chip multiprocessor, in which 16 processors share a 16MB pool of 256 L2 cache banks.  ...  We show that this organization can support the spectrum of degrees of sharing: unshared, in which each processor has a private portion of the cache, thus reducing hit latency, completely shared, in which  ...  The directory decides, without snooping other L2 caches in the chip, whether to get data from another L2 cache on the chip or whether to issue an off-chip memory request.  ... 
doi:10.1145/1088149.1088154 dblp:conf/ics/HuhKSZBK05 fatcat:f7i3mscyy5g6tfhhc2bs3csmsq

A NUCA substrate for flexible CMP cache sharing

Jaehyuk Huh, Changkyu Kim, Hazim Shafi, Lixin Zhang, Doug Burger, Stephen W. Keckler
2014 25th Anniversary International Conference on Supercomputing Anniversary Volume -  
We propose an organization for the on-chip memory system of a chip multiprocessor, in which 16 processors share a 16MB pool of 256 L2 cache banks.  ...  We show that this organization can support the spectrum of degrees of sharing: unshared, in which each processor has a private portion of the cache, thus reducing hit latency, completely shared, in which  ...  The directory decides, without snooping other L2 caches in the chip, whether to get data from another L2 cache on the chip or whether to issue an off-chip memory request.  ... 
doi:10.1145/2591635.2667186 fatcat:l6thcibrlbfhpc4nvgwoa5omgi
« Previous Showing results 1 — 15 out of 22,276 results