Filters








2,627 Hits in 6.4 sec

How useful are non-blocking loads, stream buffers and speculative execution in multiple issue processors?

K.I. Farkas, N.P. Jouppi, P. Chow
Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture  
Abstract We investigate the relative performance impact of non-blocking loads, stream buffers, and speculative execution both used individually and in conjunction with each other.  ...  The addition of speculative execution further improves the performance of the systems that we have simulated, with or without non-blocking loads and stream buffers, by an additional 20% to 40%.  ...  In this paper, we consider non-blocking loads, stream buffers, and speculative loads.  ... 
doi:10.1109/hpca.1995.386553 dblp:conf/hpca/FarkasJC95 fatcat:smjsk4sytfa4vin7ybpavbyzty

Multithreaded Processors

T. Ungerer
2002 Computer journal  
The execution units are multiplexed between the threads in the register sets.  ...  Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  Smaller buffers limit the speculation depth of fetched instructions and lead to the fact that only non-speculative instructions or instructions with low speculation depth are fetched, decoded, issued and  ... 
doi:10.1093/comjnl/45.3.320 fatcat:hlkkabuhrzhkrmuyqomzfmc6zm

Multi-Threaded Processors [chapter]

David Padua, Amol Ghoting, John A. Gunnels, Mark S. Squillante, José Meseguer, James H. Cownie, Duncan Roweth, Sarita V. Adve, Hans J. Boehm, Sally A. McKee, Robert W. Wisniewski, George Karypis (+29 others)
2011 Encyclopedia of Parallel Computing  
The execution units are multiplexed between the threads in the register sets.  ...  Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  Smaller buffers limit the speculation depth of fetched instructions and lead to the fact that only non-speculative instructions or instructions with low speculation depth are fetched, decoded, issued and  ... 
doi:10.1007/978-0-387-09766-4_423 fatcat:heb3n2cfwnbi5nvxv5kvxd2xgm

Slipstream processors

Karthik Sundaramoorthy, Zach Purser, Eric Rotenberg
2000 SIGPLAN notices  
The delay buffer is used to communicate control and data flow outcomes from A-stream to R-stream (the R-stream is "delayed" with respect to the A-stream [24]). 4.  ...  Note that two kinds of speculation occur in the A-stream. Conventional speculation occurs when branches are predicted and the branch-related computation has not been removed from the A-stream.  ...  We are grateful to Jim Smith for suggesting the name "slipstream" and pointing out the useful car racing analogy.  ... 
doi:10.1145/356989.357013 fatcat:vs4txm2jsbhfzfegv3drxfo4c4

Multiscalar processors

Gurindar S. Sohi, Scott E. Breach, T. N. Vijaykumar
1995 Proceedings of the 22nd annual international symposium on Computer architecture - ISCA '95  
Memory accesses may occur speculatively without knowledge of preceding loads or stores. Addresses are disambiguated dynamically, many in parallel, and processing waits only for true data dependence.  ...  The tasks are distributed to a number of parallel processing units which reside within a processor complex. Each of these units fetches and executes instructions belonging to its assigned task.  ...  We would like to thank Jim Smith for his contributions to the multiscalar project in general, and this paper in particular.  ... 
doi:10.1145/223982.224451 dblp:conf/isca/SohiBV95 fatcat:tdhpug2w5be4bgpluwrdi4elt4

Multiscalar processors

Gurindar S. Sohi, Scott E. Breach, T. N. Vijaykumar
1995 SIGARCH Computer Architecture News  
Memory accesses may occur speculatively without knowledge of preceding loads or stores. Addresses are disambiguated dynamically, many in parallel, and processing waits only for true data dependence.  ...  The tasks are distributed to a number of parallel processing units which reside within a processor complex. Each of these units fetches and executes instructions belonging to its assigned task.  ...  We would like to thank Jim Smith for his contributions to the multiscalar project in general, and this paper in particular.  ... 
doi:10.1145/225830.224451 fatcat:55gupm24cnhkhd7utva32dqwqu

Multiscalar processors

Gurindar S. Sohi, Scott E. Breach, T. N. Vijaykumar
1998 25 years of the international symposia on Computer architecture (selected papers) - ISCA '98  
Memory accesses may occur speculatively without knowledge of preceding loads or stores. Addresses are disambiguated dynamically, many in parallel, and processing waits only for true data dependence.  ...  The tasks are distributed to a number of parallel processing units which reside within a processor complex. Each of these units fetches and executes instructions belonging to its assigned task.  ...  We would like to thank Jim Smith for his contributions to the multiscalar project in general, and this paper in particular.  ... 
doi:10.1145/285930.286010 dblp:conf/isca/SohiBV98 fatcat:tba7kr7pkbb6viigq73o57k5pq

Idempotent processor architecture

Marc de Kruijf, Karthikeyan Sankaralingam
2011 Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture - MICRO-44 '11  
In this paper, we demonstrate how idempotent processing simplifies the design of in-order processors.  ...  This paper presents a new processor architecture, the idempotent processor architecture, that advances both of these directions by presenting a new execution paradigm that allows speculative execution  ...  Idempotent Processor Lean is our idempotent processor configuration that does not include an SDB or non-blocking cache while Idempotent Processor Fast includes a 4-entry SDB and a non-blocking cache.  ... 
doi:10.1145/2155620.2155637 dblp:conf/micro/KruijfS11 fatcat:cktbx6nww5gavorpghak76g72e

Slice-processors

Andreas Moshovos, Dionisios N. Pnevmatikatos, Amirali Baniasadi
2001 Proceedings of the 15th international conference on Supercomputing - ICS '01  
Slice processors are a generalization of existing operation-based prefetching mechanisms such as stream buffers where the operation itself is fixed in the design (e.g., address + stride).  ...  Such slices are then executed in-parallel with the main sequential thread prefetching memory data.  ...  This work was supported in part by an NSF CAREER award and by funds from the University of Toronto.  ... 
doi:10.1145/377792.377856 dblp:conf/ics/MoshovosPB01 fatcat:etizvwwumfffhpzk5bfl2y72ji

A study of slipstream processors

Zach Purser, Karthik Sundaramoorthy, Eric Rotenberg
2000 Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture - MICRO 33  
Both programs are sped up: combined, they outperform conventional non-redundant execution. We study slipstreaming with the following key results. 1.  ...  As execution bandwidth is increased, slipstreaming provides less of a performance advantage -unless instructions are removed in the short program before they are fetched.  ...  reorder buffer: 64, 128, or 256 entries dispatch/issue/retire bandwidth: 4-/8-/16-way n fully-symmetric function units (n = issue b/w) n loads/stores per cycle (n = issue b/w) gshare-indexed (16 bits of  ... 
doi:10.1145/360128.360155 fatcat:ortvhf3qffa7tgm4kv7blp5rvy

Disjoint out-of-order execution processor

Mageda Sharafeddine, Komal Jothi, Haitham Akkary
2012 ACM Transactions on Architecture and Code Optimization (TACO)  
High-performance superscalar architectures used to exploit instruction level parallelism in single-thread applications have become too complex and power hungry for the multicore processors era.  ...  Hence we call this style of execution Disjoint Out-of-Order Execution (DOE). DOE uses latency tolerance to overcome performance issues of SpMT caused by interthread data dependences.  ...  Speculative Cache and Load and Store Execution Like other SpMT proposals, DOE allows multiple speculative cache block versions to reside simultaneously in the distributed L1 data caches of the various  ... 
doi:10.1145/2355585.2355592 fatcat:3mrp3fyihfgtnitoli35mhmtdy

A survey of processors with explicit multithreading

Theo Ungerer, Borut Robič, Jurij Šilc
2003 ACM Computing Surveys  
The execution units are multiplexed between the thread contexts that are loaded in the register sets.  ...  Several multithreaded processors are announced by industry or already into production in the areas of high-performance microprocessors, media, and network processors.  ...  Each logical processor can use up to a maximum of 63 reorder buffer entries, 24 load buffers, and 12 store buffer entries.  ... 
doi:10.1145/641865.641867 fatcat:u6x7jdmkfvexnm3culskjsoxwi

The microarchitecture of superscalar processors

J.E. Smith, G.S. Sohi
1995 Proceedings of the IEEE  
By exploiting instruction-level parallelism, superscalar processors are capable of executing more than one instruction in a clock cycle.  ...  for parallel execution, 4) the communication of data values through memory via loads and stores, and 5) committing the process state in correct order so that precise interrupts can be supported.  ...  ACKNOWLEDGMENT The authors would like to thank Mark Hill for comments and Scott Breach and Dionisios Pnevmatikatos for their help with some of the figures.  ... 
doi:10.1109/5.476078 fatcat:hhb5l7dxjveaffbe2e2zn2izxq

Composable Lightweight Processors

Changkyu Kim, Simha Sethumadhavan, M.S. Govindan, Nitya Ranganathan, Divya Gulati, Doug Burger, Stephen W. Keckler
2007 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007)  
We evaluate one such design with 32 cores called TFlex, which can be configured as 32 dual-issue processors, or as a single 64-wide issue processor, or as any point in between.  ...  However, the number of processors and the granularity of each processor are fixed at design time.  ...  Acknowledgments We thank Ramdass Nagarajan, Haiming Liu, Mark Gebhart, Bert Maher, Katherine Coons, Jeff Diamond and Behnam Robatmilli for their contribution to the paper.  ... 
doi:10.1109/micro.2007.41 dblp:conf/micro/KimSGRGBK07 fatcat:rlbsu4nftjbchnxvxbqhs4gzsu

Composable Lightweight Processors

Changkyu Kim, Simha Sethumadhavan, M.S. Govindan, Nitya Ranganathan, Divya Gulati, Doug Burger, Stephen W. Keckler
2007 Microarchitecture (MICRO), Proceedings of the Annual International Symposium on  
We evaluate one such design with 32 cores called TFlex, which can be configured as 32 dual-issue processors, or as a single 64-wide issue processor, or as any point in between.  ...  However, the number of processors and the granularity of each processor are fixed at design time.  ...  Acknowledgments We thank Ramdass Nagarajan, Haiming Liu, Mark Gebhart, Bert Maher, Katherine Coons, Jeff Diamond and Behnam Robatmilli for their contribution to the paper.  ... 
doi:10.1109/micro.2007.4408270 fatcat:m2zm2hxwczfc5gfihtqhmqe63a
« Previous Showing results 1 — 15 out of 2,627 results