A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2012; you can also visit the original URL.
The file type is application/pdf
.
Filters
Strongly Competitive Algorithms for Caching with Pipelined Prefetching
[chapter]
2001
Lecture Notes in Computer Science
The goal is to find a policy for prefetching and caching, which minimizes the overall execution time of a given reference sequence. ...
In the offline case, we show that an algorithm proposed by Cao et al. [6] is optimal for this problem. ...
Acknowledgments: We thank Anna Karlin, Rajeev Motwani and Prabhakar Raghavan, for valuable discussions on this paper. Thanks also to two anonymous referees for helpful comments and suggestions. ...
doi:10.1007/3-540-44676-1_4
fatcat:7e66u6dd55fsxp3rvcxllqlky4
Strongly competitive algorithms for caching with pipelined prefetching
2004
Information Processing Letters
The goal is to find a policy for prefetching and caching, which minimizes the overall execution time of a given reference sequence. ...
In the offline case, we show that an algorithm proposed by Cao et al. [6] is optimal for this problem. ...
Acknowledgments: We thank Anna Karlin, Rajeev Motwani and Prabhakar Raghavan, for valuable discussions on this paper. Thanks also to two anonymous referees for helpful comments and suggestions. ...
doi:10.1016/j.ipl.2004.03.008
fatcat:rxz77bljcbhnfkri47b3nvga6y
Way Stealing: A Unified Data Cache and Architecturally Visible Storage for Instruction Set Extensions
2014
IEEE Transactions on Very Large Scale Integration (vlsi) Systems
Our results show that Way Stealing is competitive in terms of performance and energy consumption with other techniques that use AVS memories in conjunction with a data cache. ...
Way Stealing is a simple architectural modification to a cache-based processor that increases the data bandwidth to and from application-specific instruction set extensions (ISEs), which increase performance ...
This could be reasonable for set associative caches with up to four ways, but would be prohibitive for caches with higher associativity. ...
doi:10.1109/tvlsi.2012.2236689
fatcat:hojfcgzfgzgifjbxtqkldn5iey
Fast Key-Value Lookups with Node Tracker
2021
ACM Transactions on Architecture and Code Optimization (TACO)
In this study, we show that although cache misses are the primary bottleneck for these applications, without a method for eliminating the branch mispredictions only a small fraction of the performance ...
Our results show that, on average, NT improves single-thread performance by 4.1× when used as a prefetcher; 11.9× as a prefetcher with BOS; 14.9× as a pre-execution unit and 18.8× as a pre-execution unit ...
BST benefits from branch outcomes strongly on a multi-core with a throughput speedup of 312× compared with the baseline on a single-core. ...
doi:10.1145/3452099
fatcat:facltmiss5anfcrmfmabj4jh3u
Resource allocation in a high clock rate microprocessor
1994
Proceedings of the sixth international conference on Architectural support for programming languages and operating systems - ASPLOS-VI
This paper discusses the design of a high clock rate (300MHz) processoc The architecture is described, and the goals for the design are explained. ...
A cost model is used to estimate the resources required to build processors with varying sizes of on-chip memories, in both single and dual issue models. ...
For the system level performance of our GaAs chipset to be competitive with that of contemporary CMOS processors the GaAs system must overcome with increased clock speed the CMOS advantage of much higher ...
doi:10.1145/195473.195510
dblp:conf/asplos/UptonHMB94
fatcat:4hv2xqb2mfhhzkgsjtcjrwd3dq
Resource allocation in a high clock rate microprocessor
1994
SIGPLAN notices
This paper discusses the design of a high clock rate (300MHz) processoc The architecture is described, and the goals for the design are explained. ...
A cost model is used to estimate the resources required to build processors with varying sizes of on-chip memories, in both single and dual issue models. ...
For the system level performance of our GaAs chipset to be competitive with that of contemporary CMOS processors the GaAs system must overcome with increased clock speed the CMOS advantage of much higher ...
doi:10.1145/195470.195510
fatcat:nxe755kxezgb7n7gizjg7czezm
Resource allocation in a high clock rate microprocessor
1994
ACM SIGOPS Operating Systems Review
This paper discusses the design of a high clock rate (300MHz) processoc The architecture is described, and the goals for the design are explained. ...
A cost model is used to estimate the resources required to build processors with varying sizes of on-chip memories, in both single and dual issue models. ...
For the system level performance of our GaAs chipset to be competitive with that of contemporary CMOS processors the GaAs system must overcome with increased clock speed the CMOS advantage of much higher ...
doi:10.1145/381792.195510
fatcat:j3qgqe6cbfafphvaqoyjrchmg4
POWER3: The next generation of PowerPC processors
2000
IBM Journal of Research and Development
nonblocking and interleaved data cache, and dual multiply-add-fused floating-point execution units. ...
There is an insatiable demand for faster computing from practitioners in engineering and scientific fields. ...
for his compiler enhancements in the area of vector-intrinsic code generation. ...
doi:10.1147/rd.446.0873
fatcat:oehxzpxilrbdvk6cchi3wd5zdu
Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors
2016
Concurrency and Computation
We investigate the performance characteristics of a numerically enhanced scalar product (dot) kernel loop that uses the Kahan algorithm to compensate for numerical errors, and describe efficient SIMD-vectorized ...
Using low-level instruction analysis and the execution-cache-memory (ECM) performance model we pinpoint the relevant performance bottlenecks for single-core and thread-parallel execution, and predict performance ...
It features two pipelines: a vector pipeline (U-pipe) with the 512-b vector processing unit attached and a scalar pipeline that handles all remaining instructions. ...
doi:10.1002/cpe.3921
fatcat:klpuwdgptre4znlvi6aoourd3e
Page 554 of Mathematical Reviews Vol. , Issue 2003A
[page]
2003
Mathematical Reviews
networks (30-32); Uri Zwick, Exact and approximate dis- tances in graphs—a survey (33-48); Alexander Gaysinsky, Alon Itai and Hadas Shachnai, Strongly competitive algorithms for caching with pipelined ...
prefetching (49-61); David A. ...
Interactive Rendering with Coherent Ray Tracing
2001
Computer graphics forum (Print)
The new algorithm makes better use of computational resources such as caches and SIMD instructions and better exploits image and object space coherence. ...
Efficient Shading With ray tracing, samples are only shaded after visibility has been determined. ...
Finally, we are especially indebted to Philippe Beakert for supplying us with nice models, and for his active support in generating the comparison to rasterization hardware. ...
doi:10.1111/1467-8659.00508
fatcat:eojbq7lyffc4lnzekfsclxfnxi
The IBM eServer z990 microprocessor
2004
IBM Journal of Research and Development
These features include a new superscalar instruction execution pipeline, highbandwidth caches, a huge secondary translation-lookaside buffer (TLB), and an onboard cryptographic coprocessor. ...
However, it makes up for this by having a shorter pipeline and much larger caches and TLBs compared with other processors, along with other performance-enhancing features. ...
The PAAHT can be thought of as a small and fast TLB that is accessed in parallel with the main D-cache TLB. There are two complete pipelines in the D-cache for processing requests. ...
doi:10.1147/rd.483.0295
fatcat:vlgr63m3yvbxxadrlmrcgyhvzy
Identifying the sources of unpredictability in COTS-based multicore systems
2013
2013 8th IEEE International Symposium on Industrial Embedded Systems (SIES)
We explore someof the existing work in timing analysis with respect to thesefeatures, identify their limitations, and present some unaddressedissues that must be dealt with to ensure safe deployment of ...
non-amenable to straight-forward timing analysis.In this paper, we highlight the architectural features leading totemporal unpredictability, which mainly involve shared hardwareresources, such as buses, caches ...
Modern processors feature large caches with high set-associativity (8-way and 16-way associative caches are not uncommon) and nondeterministic replacement algorithms, making cache analysis extremely challenging ...
doi:10.1109/sies.2013.6601469
dblp:conf/sies/DasariANAP13
fatcat:dg5rkpigebce7fkmpmbzry3fhm
Missing the memory wall
1996
Proceedings of the 23rd annual international symposium on Computer architecture - ISCA '96
This paper argues for an integrated system approach that uses less-powerful CPUs that are tightly integrated with advanced memory technologies to build competitive systems with greatly reduced cost and ...
In this system, small direct mapped instruction caches with long lines are very effective, as are column buffer data caches augmented with a victim cache. ...
A least-recently-used replacement algorithm used for the entries in the victim cache. The victim cache had a dramatic effect on the cache miss rates. ...
doi:10.1145/232973.232984
dblp:conf/isca/SaulsburyPN96
fatcat:ut72ah2zxzh73onrac3vems5aq
Missing the memory wall
1996
SIGARCH Computer Architecture News
This paper argues for an integrated system approach that uses less-powerful CPUs that are tightly integrated with advanced memory technologies to build competitive systems with greatly reduced cost and ...
In this system, small direct mapped instruction caches with long lines are very effective, as are column buffer data caches augmented with a victim cache. ...
A least-recently-used replacement algorithm used for the entries in the victim cache. The victim cache had a dramatic effect on the cache miss rates. ...
doi:10.1145/232974.232984
fatcat:w5c3hi3725dpdpc76725f5pyqq
« Previous
Showing results 1 — 15 out of 348 results