Filters








3,831 Hits in 2.9 sec

Improving index performance through prefetching

Shimin Chen, Phillip B. Gibbons, Todd C. Mowry
2001 Proceedings of the 2001 ACM SIGMOD international conference on Management of data - SIGMOD '01  
To accelerate searches, pB + -Trees use prefetching to e ectively create wider nodes than the natural data transfer size: e.g., eight vs. one cache lines or disk pages.  ...  Our results show that this technique yields over a sixfold speedup on range scans of 1000+ keys.  ...  Since index search is also performed prior to deletion, the entire root-to-leaf path is in the cache.  ... 
doi:10.1145/375663.375688 dblp:conf/sigmod/ChenGM01 fatcat:xxo3a6jvvnc2ddjhrsnv6a6f4y

Improving index performance through prefetching

Shimin Chen, Phillip B. Gibbons, Todd C. Mowry
2001 SIGMOD record  
To accelerate searches, pB + -Trees use prefetching to e ectively create wider nodes than the natural data transfer size: e.g., eight vs. one cache lines or disk pages.  ...  Our results show that this technique yields over a sixfold speedup on range scans of 1000+ keys.  ...  Since index search is also performed prior to deletion, the entire root-to-leaf path is in the cache.  ... 
doi:10.1145/376284.375688 fatcat:emaisdvcfbfk3au6frqc6zszva

Techniques for increasing effective data bandwidth

Christopher Nitta, Matthew Farrens
2008 2008 IEEE International Conference on Computer Design  
By using a simple heuristic to classify the contents of a cache line and providing different compression schemes for each classification, we show it is possible to provide overall compression at a cache  ...  line granularity comparable to that obtained by using a much more complex Lempel-Ziv-Welch algorithm.  ...  number of transactions per unit time (i.e. increasing the bus frequency), and/or by expanding the amount of data transferred per transaction (creating a wider interconnect).  ... 
doi:10.1109/iccd.2008.4751909 dblp:conf/iccd/NittaF08 fatcat:g5kk6d6icfb6xfb7ku43ftbu4e

Exploiting Low Entropy to Reduce Wire Delay

D. Citron
2004 IEEE computer architecture letters  
Reducing the number of wires per bus, enables the use of wider wires, which in turn reduces the wire delay.  ...  We propose a stopgap solution to this problem by applying a decade old technique called bus-expanding to the problem.  ...  We are currently testing the technique on a wider range of BE configurations connecting various units, perfecting a power/performance model, and measuring the exact latency and area requirements of implementing  ... 
doi:10.1109/l-ca.2004.7 fatcat:6o66t6ykz5h45mhkzq66nkz4li

A performance comparison of contemporary DRAM architectures

Vinodh Cuppu, Bruce Jacob, Brian Davis, Trevor Mudge
1999 SIGARCH Computer Architecture News  
Our simulations reveal several things: (a) current advanced DRAM technologies are attacking the memory bandwidth problem but not the latency problem; (b) bus transmission speed will soon become a primary  ...  These small-system organizations correspond to workstation-class computers and use on the order of 10 DRAM chips.  ...  A Performance Comparison of Contemporary DRAM Architectures We would like to thank several researchers at IBM who provided helpful insight into the internal workings of the various DRAM architectures:  ... 
doi:10.1145/307338.300998 fatcat:mbcw6ph6mnb65b27khnsaca4zi

Fast, Performance-Optimized Partial Match Address Compression for Low-Latency On-Chip Address Buses

Jiangjiang Liu, Krishnan Sundaresan, Nihar R. Mahapatra
2006 2006 International Conference on Computer Design  
match the higher-order portions of recently-occurring addresses saved in a very small "compression cache" of capacity less than 500 bits.  ...  A previously-proposed scheme called bus expander (BE) supports only a single, fixed-size match for compression.  ...  ACKNOWLEDGMENT This research was supported by US National Science Foundation grant # 0102830.  ... 
doi:10.1109/iccd.2006.4380788 dblp:conf/iccd/LiuSM06 fatcat:f2htvqhvongztglpbe4gskt4qm

Refueling: Preventing Wire Degradation due to Electromigration

Jaume Abella, Xavier Vera, Osman S. Unsal, Oguz Ergin, Antonio González, James W. Tschanz
2008 IEEE Micro  
We use this technique to extend the EM lifetimes of bidirectional wires and power/ground grids by several orders of magnitude without requiring wider wires.  ...  Evaluation We evaluated an example of a core in which we apply refueling to both the data bus between the data (DL0) cache and the unified (UL1) cache, and the on-chip bus between the UL1 cache and memory  ... 
doi:10.1109/mm.2008.92 fatcat:fogq7typpfaprnsxdxer25vjgm

An Off-Chip Attack on Hardware Enclaves via the Memory Bus [article]

Dayeol Lee, Dongha Jung, Ian T. Fang, Chia-Che Tsai, Raluca Ada Popa
2019 arXiv   pre-print
We introduce three techniques, critical page whitelisting, cache squeezing, and oracle-based fuzzy matching algorithm to increase cache misses for memory accesses that are useful for the attack, with no  ...  First, DRAM requests are only visible on the memory bus at last-level cache misses.  ...  Other techniques such as FLUSH+RELOAD [45] and FLUSH+FLUSH [46] use a shared cache block between the attacker and the victim to create a noiseless and lossless side channel.  ... 
arXiv:1912.01701v1 fatcat:nj6kipl65zewtd4tn6x6p6gzse

Centip3De: A Cluster-Based NTC Architecture With 64 ARM Cortex-M3 Cores in 3D Stacked 130 nm CMOS

David Fick, Ronald G. Dreslinski, Bharan Giridhar, Gyouho Kim, Sangwon Seo, Matthew Fojtik, Sudhir Satpathy, Yoonmyung Lee, Daeyeon Kim, Nurrachman Liu, Michael Wieckowski, Gregory Chen (+3 others)
2013 IEEE Journal of Solid-State Circuits  
We present Centip3De, a large-scale 3D CMP with a cluster-based near-threshold computing (NTC) architecture. Centip3De uses a 3D stacking technology in conjunction with 130 nm CMOS.  ...  This project demonstrates the feasibility of large-scale 3D design, a synergy between 3D and NTC architectures, a unique cluster-based NTC cache design, and how to maximize performance in a thermally-constrained  ...  A wider system-level analysis is performed in Table III .  ... 
doi:10.1109/jssc.2012.2222814 fatcat:3wc3rliiajddnmk432u5223ldm

High-performance DRAMs in workstation environments

V. Cuppu, B. Jacob, B. Davis, T. Mudge
2001 IEEE transactions on computers  
These small-system organizations correspond to workstation-class computers and use only a handful of DRAM chips (~10, as opposed to~1 or~100).  ...  for low-and medium-speed CPUs (1GHz and below); and 5) as we move to wider buses, row access time becomes more prominent, making it important to investigate techniques to exploit the available locality  ...  ACKNOWLEDGMENTS This study grew out of research begun by Brian Davis and extended by Vinodh Cuppu, O È zkan Dikmen, and Rohit Grover in a graduate-level architecture class taught by Professor Jacob in  ... 
doi:10.1109/12.966491 fatcat:r4glk3j7unerpkkmuetfwl5yeq

FlexDEF: Development framework for processor architecture implementation and evaluation

Kasyab P. Subramaniyan, Erik Ryman, Magnus Sjalander, Tung Thanh Hoang, Mafijul Md Islam, Per Larsson-Edefors
2011 2011 7th Conference on Ph.D. Research in Microelectronics and Electronics  
We present the FlexCore Design Exploration Framework (FlexDEF), an end-to-end tool-chain used to develop the FlexCore processor and its accompanying cache system.  ...  Designing a processor is a complex task that uses multiple and varied tools. The complete development cycle spans software as well as hardware design and verification.  ...  Fig. 3 shows the Finite State Machines (FSMs) used in the instruction and data cache controllers, created for this purpose.  ... 
doi:10.1109/prime.2011.5966211 fatcat:lkybjq5cibdjjkevlvcf2praoe

POWER2 fixed-point, data cache, and storage control units

D. J. Shippy, T. W. Griffith
1994 IBM Journal of Research and Development  
The POWERP" fixed-point, data cache, and storage control units provide a tightly integrated subunit for a second-generation high-performance superscalar RISC processor.  ...  , increased bandwidth into and out of the caches through wider data buses, an improved external interrupt mechanism, and an improved I/O DMA mechanism to support multiple-streaming Micro Channels.@ "Copyright  ...  The cache is a multiported design which uses a virtual multiport technique [6] and a standard single-port cell macro.  ... 
doi:10.1147/rd.385.0503 fatcat:d5wj2h3mzbccdfpgsqabc2zvwm

Page overlays

Vivek Seshadri, Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry, Trishul Chilimbi
2015 SIGARCH Computer Architecture News  
We propose a new virtual memory framework that enables e cient implementation of a variety of ne-grained memory management techniques.  ...  In our framework, each virtual page can be mapped to a structure called a page overlay, in addition to a regular physical page. An overlay contains a subset of cache lines from the virtual page.  ...  Hardware Cost and OS Changes There are three sources of hardware overhead in our design: 1) the OMT Cache, 2) wider TLB entries (to store the OBitVector), and 3) wider cache tags (due to the wider physical  ... 
doi:10.1145/2872887.2750379 fatcat:mrlugnl6yzbn3ikfanuj2myuja

Page overlays

Vivek Seshadri, Gennady Pekhimenko, Olatunji Ruwase, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry, Trishul Chilimbi
2015 Proceedings of the 42nd Annual International Symposium on Computer Architecture - ISCA '15  
We propose a new virtual memory framework that enables e cient implementation of a variety of ne-grained memory management techniques.  ...  In our framework, each virtual page can be mapped to a structure called a page overlay, in addition to a regular physical page. An overlay contains a subset of cache lines from the virtual page.  ...  Hardware Cost and OS Changes There are three sources of hardware overhead in our design: 1) the OMT Cache, 2) wider TLB entries (to store the OBitVector), and 3) wider cache tags (due to the wider physical  ... 
doi:10.1145/2749469.2750379 dblp:conf/isca/SeshadriPRMGKMC15 fatcat:e2pocasn25b7pnzbybmiszy6ry

Hardware support for WCET analysis of hard real-time multicore systems

Marco Paolieri, Eduardo Quiñones, Francisco J. Cazorla, Guillem Bernat, Mateo Valero
2009 SIGARCH Computer Architecture News  
Multicore processors represent a good design solution for such systems due to their high performance, low cost and power consumption characteristics.  ...  In this paper we propose a multicore architecture with shared resources that allows the execution of applications with hard real-time and non hard real-time constraints at the same time, providing time  ...  Our ICBA splits wide bus transfers into independent request so they can be sent in non-consecutive bus slots. We allow this way bus transfers wider than the bus bandwidth.  ... 
doi:10.1145/1555815.1555764 fatcat:m7rjwil5angjnhaltopzlnlszi
« Previous Showing results 1 — 15 out of 3,831 results