Filters








348 Hits in 4.4 sec

Strongly Competitive Algorithms for Caching with Pipelined Prefetching [chapter]

Alexander Gaysinsky, Alon Itai, Hadas Shachnai
2001 Lecture Notes in Computer Science  
The goal is to find a policy for prefetching and caching, which minimizes the overall execution time of a given reference sequence.  ...  In the offline case, we show that an algorithm proposed by Cao et al. [6] is optimal for this problem.  ...  Acknowledgments: We thank Anna Karlin, Rajeev Motwani and Prabhakar Raghavan, for valuable discussions on this paper. Thanks also to two anonymous referees for helpful comments and suggestions.  ... 
doi:10.1007/3-540-44676-1_4 fatcat:7e66u6dd55fsxp3rvcxllqlky4

Strongly competitive algorithms for caching with pipelined prefetching

Alexander Gaysinsky, Alon Itai, Hadas Shachnai
2004 Information Processing Letters  
The goal is to find a policy for prefetching and caching, which minimizes the overall execution time of a given reference sequence.  ...  In the offline case, we show that an algorithm proposed by Cao et al. [6] is optimal for this problem.  ...  Acknowledgments: We thank Anna Karlin, Rajeev Motwani and Prabhakar Raghavan, for valuable discussions on this paper. Thanks also to two anonymous referees for helpful comments and suggestions.  ... 
doi:10.1016/j.ipl.2004.03.008 fatcat:rxz77bljcbhnfkri47b3nvga6y

Way Stealing: A Unified Data Cache and Architecturally Visible Storage for Instruction Set Extensions

Theo Kluter, Philip Brisk, Edoardo Charbon, Paolo Ienne
2014 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
Our results show that Way Stealing is competitive in terms of performance and energy consumption with other techniques that use AVS memories in conjunction with a data cache.  ...  Way Stealing is a simple architectural modification to a cache-based processor that increases the data bandwidth to and from application-specific instruction set extensions (ISEs), which increase performance  ...  This could be reasonable for set associative caches with up to four ways, but would be prohibitive for caches with higher associativity.  ... 
doi:10.1109/tvlsi.2012.2236689 fatcat:hojfcgzfgzgifjbxtqkldn5iey

Fast Key-Value Lookups with Node Tracker

Mustafa Cavus, Mohammed Shatnawi, Resit Sendag, Augustus K. Uht
2021 ACM Transactions on Architecture and Code Optimization (TACO)  
In this study, we show that although cache misses are the primary bottleneck for these applications, without a method for eliminating the branch mispredictions only a small fraction of the performance  ...  Our results show that, on average, NT improves single-thread performance by 4.1× when used as a prefetcher; 11.9× as a prefetcher with BOS; 14.9× as a pre-execution unit and 18.8× as a pre-execution unit  ...  BST benefits from branch outcomes strongly on a multi-core with a throughput speedup of 312× compared with the baseline on a single-core.  ... 
doi:10.1145/3452099 fatcat:facltmiss5anfcrmfmabj4jh3u

Resource allocation in a high clock rate microprocessor

Michael Upton, Thomas Huff, Trevor Mudge, Richard Brown
1994 Proceedings of the sixth international conference on Architectural support for programming languages and operating systems - ASPLOS-VI  
This paper discusses the design of a high clock rate (300MHz) processoc The architecture is described, and the goals for the design are explained.  ...  A cost model is used to estimate the resources required to build processors with varying sizes of on-chip memories, in both single and dual issue models.  ...  For the system level performance of our GaAs chipset to be competitive with that of contemporary CMOS processors the GaAs system must overcome with increased clock speed the CMOS advantage of much higher  ... 
doi:10.1145/195473.195510 dblp:conf/asplos/UptonHMB94 fatcat:4hv2xqb2mfhhzkgsjtcjrwd3dq

Resource allocation in a high clock rate microprocessor

Michael Upton, Thomas Huff, Trevor Mudge, Richard Brown
1994 SIGPLAN notices  
This paper discusses the design of a high clock rate (300MHz) processoc The architecture is described, and the goals for the design are explained.  ...  A cost model is used to estimate the resources required to build processors with varying sizes of on-chip memories, in both single and dual issue models.  ...  For the system level performance of our GaAs chipset to be competitive with that of contemporary CMOS processors the GaAs system must overcome with increased clock speed the CMOS advantage of much higher  ... 
doi:10.1145/195470.195510 fatcat:nxe755kxezgb7n7gizjg7czezm

Resource allocation in a high clock rate microprocessor

Michael Upton, Thomas Huff, Trevor Mudge, Richard Brown
1994 ACM SIGOPS Operating Systems Review  
This paper discusses the design of a high clock rate (300MHz) processoc The architecture is described, and the goals for the design are explained.  ...  A cost model is used to estimate the resources required to build processors with varying sizes of on-chip memories, in both single and dual issue models.  ...  For the system level performance of our GaAs chipset to be competitive with that of contemporary CMOS processors the GaAs system must overcome with increased clock speed the CMOS advantage of much higher  ... 
doi:10.1145/381792.195510 fatcat:j3qgqe6cbfafphvaqoyjrchmg4

POWER3: The next generation of PowerPC processors

F. P. O'Connell, S. W. White
2000 IBM Journal of Research and Development  
nonblocking and interleaved data cache, and dual multiply-add-fused floating-point execution units.  ...  There is an insatiable demand for faster computing from practitioners in engineering and scientific fields.  ...  for his compiler enhancements in the area of vector-intrinsic code generation.  ... 
doi:10.1147/rd.446.0873 fatcat:oehxzpxilrbdvk6cchi3wd5zdu

Performance analysis of the Kahan-enhanced scalar product on current multi-core and many-core processors

Johannes Hofmann, Dietmar Fey, Michael Riedmann, Jan Eitzinger, Georg Hager, Gerhard Wellein
2016 Concurrency and Computation  
We investigate the performance characteristics of a numerically enhanced scalar product (dot) kernel loop that uses the Kahan algorithm to compensate for numerical errors, and describe efficient SIMD-vectorized  ...  Using low-level instruction analysis and the execution-cache-memory (ECM) performance model we pinpoint the relevant performance bottlenecks for single-core and thread-parallel execution, and predict performance  ...  It features two pipelines: a vector pipeline (U-pipe) with the 512-b vector processing unit attached and a scalar pipeline that handles all remaining instructions.  ... 
doi:10.1002/cpe.3921 fatcat:klpuwdgptre4znlvi6aoourd3e

Page 554 of Mathematical Reviews Vol. , Issue 2003A [page]

2003 Mathematical Reviews  
networks (30-32); Uri Zwick, Exact and approximate dis- tances in graphs—a survey (33-48); Alexander Gaysinsky, Alon Itai and Hadas Shachnai, Strongly competitive algorithms for caching with pipelined  ...  prefetching (49-61); David A.  ... 

Interactive Rendering with Coherent Ray Tracing

Ingo Wald, Philipp Slusallek, Carsten Benthin, Markus Wagner
2001 Computer graphics forum (Print)  
The new algorithm makes better use of computational resources such as caches and SIMD instructions and better exploits image and object space coherence.  ...  Efficient Shading With ray tracing, samples are only shaded after visibility has been determined.  ...  Finally, we are especially indebted to Philippe Beakert for supplying us with nice models, and for his active support in generating the comparison to rasterization hardware.  ... 
doi:10.1111/1467-8659.00508 fatcat:eojbq7lyffc4lnzekfsclxfnxi

The IBM eServer z990 microprocessor

T. J. Slegel, E. Pfeffer, J. A. Magee
2004 IBM Journal of Research and Development  
These features include a new superscalar instruction execution pipeline, highbandwidth caches, a huge secondary translation-lookaside buffer (TLB), and an onboard cryptographic coprocessor.  ...  However, it makes up for this by having a shorter pipeline and much larger caches and TLBs compared with other processors, along with other performance-enhancing features.  ...  The PAAHT can be thought of as a small and fast TLB that is accessed in parallel with the main D-cache TLB. There are two complete pipelines in the D-cache for processing requests.  ... 
doi:10.1147/rd.483.0295 fatcat:vlgr63m3yvbxxadrlmrcgyhvzy

Identifying the sources of unpredictability in COTS-based multicore systems

Dakshina Dasari, Benny Akesson, Vincent Nelis, Muhammad Ali Awan, Stefan M. Petters
2013 2013 8th IEEE International Symposium on Industrial Embedded Systems (SIES)  
We explore someof the existing work in timing analysis with respect to thesefeatures, identify their limitations, and present some unaddressedissues that must be dealt with to ensure safe deployment of  ...  non-amenable to straight-forward timing analysis.In this paper, we highlight the architectural features leading totemporal unpredictability, which mainly involve shared hardwareresources, such as buses, caches  ...  Modern processors feature large caches with high set-associativity (8-way and 16-way associative caches are not uncommon) and nondeterministic replacement algorithms, making cache analysis extremely challenging  ... 
doi:10.1109/sies.2013.6601469 dblp:conf/sies/DasariANAP13 fatcat:dg5rkpigebce7fkmpmbzry3fhm

Missing the memory wall

Ashley Saulsbury, Fong Pong, Andreas Nowatzyk
1996 Proceedings of the 23rd annual international symposium on Computer architecture - ISCA '96  
This paper argues for an integrated system approach that uses less-powerful CPUs that are tightly integrated with advanced memory technologies to build competitive systems with greatly reduced cost and  ...  In this system, small direct mapped instruction caches with long lines are very effective, as are column buffer data caches augmented with a victim cache.  ...  A least-recently-used replacement algorithm used for the entries in the victim cache. The victim cache had a dramatic effect on the cache miss rates.  ... 
doi:10.1145/232973.232984 dblp:conf/isca/SaulsburyPN96 fatcat:ut72ah2zxzh73onrac3vems5aq

Missing the memory wall

Ashley Saulsbury, Fong Pong, Andreas Nowatzyk
1996 SIGARCH Computer Architecture News  
This paper argues for an integrated system approach that uses less-powerful CPUs that are tightly integrated with advanced memory technologies to build competitive systems with greatly reduced cost and  ...  In this system, small direct mapped instruction caches with long lines are very effective, as are column buffer data caches augmented with a victim cache.  ...  A least-recently-used replacement algorithm used for the entries in the victim cache. The victim cache had a dramatic effect on the cache miss rates.  ... 
doi:10.1145/232974.232984 fatcat:w5c3hi3725dpdpc76725f5pyqq
« Previous Showing results 1 — 15 out of 348 results