2,652 Hits in 2.9 sec

Instruction recycling on a multiple-path processor

S. Wallace, D.M. Tullsen, B. Calder
1999 Proceedings Fifth International Symposium on High-Performance Computer Architecture  
On a multiple-path processor, which speculatively executes less likely paths of hard-to-predict branches, the work done along a speculative path is normally discarded if that path is found to be incorrect  ...  Instruction recycling and reuse are examined for a simultaneous multithreading architecture with multiple path execution.  ...  MIP-9701708, and a Digital Equipment Corporation external research grant No. US-0040-97.  ... 
doi:10.1109/hpca.1999.744323 dblp:conf/hpca/WallaceTC99 fatcat:p2f6kkej3faodlcx6hmvecfaua

Scalar program performance on multiple-instruction-issue processors with a limited number of registers

S.A. Mahlke, W.Y. Chen, P.P. Chang, W.W. Hwu
1992 Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences  
First, multiple-instruction-issue processors can perform e ectively without a large number of registers.  ...  In this paper the performance of multiple-instructionissue processors with variable register le sizes is examined for a set of scalar programs. We make several important observations.  ...  Lee Hoevel at NCR, the AMD 29K Advanced Processor Development Division, the National Aeronautics and Space Administration NASA under Contract NASA NAG 1-613 in cooperation with the Illinois Computer laboratory  ... 
doi:10.1109/hicss.1992.183141 fatcat:fqjdthuf3nchhkrn6yvbokfjde


Abhishek Tiwari, Smruti R. Sarangi, Josep Torrellas
2007 Proceedings of the 34th annual international symposium on Computer architecture - ISCA '07  
Finally, ReCycle can also convert slack into power reductions. For a 17FO4 pipeline, ReCycle increases the frequency by 12% and the application performance by 9% on average.  ...  As a result, the pipeline can be clocked with a period close to the average stage delay rather than the longest one.  ...  Consequently, we assume that critical paths are distributed in a spatially-uniform manner on the processor layout -except in the L2, whose paths we assume never affect the cycle time.  ... 
doi:10.1145/1250662.1250703 dblp:conf/isca/TiwariST07 fatcat:dwc4huanbzdrhb5uqtrxvnb7ni

Architecture of Embedded Microprocessors [chapter]

Eric Rotenberg, Aravindh Anantaraman
2005 Multiprocessor Systems-on-Chips  
Thus, a single processor powering all of these embedded systems must support arbitrary software, even though any one system has a fixed task-set.  ...  EMBEDDED VERSUS HIGH-PERFORMANCE PROCESSORS: A COMMON FOUNDATION Embedded processors are general-purpose in a different sense than the highperformance processors used in personal computers.  ...  SIMD instructions perform the same operation on multiple bytes or half-words in parallel, a very cheap form of superscalar execution.  ... 
doi:10.1016/b978-012385251-9/50018-9 fatcat:smv3zphpnjfvrh5mslzjvgz4fa

Kilo-instruction Processors [chapter]

Adrián Cristal, Daniel Ortega, Josep Llosa, Mateo Valero
2003 Lecture Notes in Computer Science  
Managing such a big number of in-flight instructions must imply a microarchitectural change in the way the re-order buffer, the instructions queues and the physical registers are handled, since simply  ...  In this paper we present a survey of several techniques which try to solve these problems caused by thousands of in-flight instructions.  ...  One of them is [19] where the authors propose Cherry, a hybrid checkpoint/ROB-based mechanism that allows early recycling of multiple resources.  ... 
doi:10.1007/978-3-540-39707-6_2 fatcat:x5u4jojs7zct7lwuxmjbc7hleq

Multipath execution

Pritpal S. Ahuja, Kevin Skadron, Margaret Martonosi, Douglas W. Clark
1998 Proceedings of the 12th international conference on Supercomputing - ICS '98  
While associated increases in instruction-fetch-bandwidth requirements are not too surprising, a less expected result is the significance of having a separate return-address stack for each forked path.  ...  Using 4 paths and a relatively simple confidence predictor, multipath execution garners speedups of up to 30% compared to the single-path case, with an average speedup of 14.4% for the SPECint suite.  ...  In a single-path processor, the CPU handles a mis-speculated path by squashing all the instructions in the RUU after the mispredicted branch.  ... 
doi:10.1145/277830.277854 dblp:conf/ics/AhujaSMC98 fatcat:2t3pcex5lvflpeoj7wpt5v2sbu

An analysis of dag-consistent distributed shared-memory algorithms

Robert D. Blumofe, Matteo Frigo, Christopher F. Joerg, Charles E. Leiserson, Keith H. Randall
1996 Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures - SPAA '96  
on P processors, each with a LRU cache of C pages, is O(T 1 (C)=P + mCT ∞ ), where T 1 (C) is the total work of the computation including page faults, T ∞ is its critical-path length excluding page faults  ...  As a corollary to this theorem, we show that the expected number F P (C) of page faults incurred by a computation executed on P processors can be related to the number F 1 (C) of serial page faults by  ...  To derive a recurrence for the critical-path length T ∞ (n), we observe that with an infinite number of processors, only one of the 8 submultiplications is the bottleneck, because the 8 multiplications  ... 
doi:10.1145/237502.237574 dblp:conf/spaa/BlumofeFJLR96 fatcat:t5uauluuyrdvllg6umgerlcgxu

The performance model of SilkRoad - a multithreaded DSM system for clusters

Liang Peng, Weng-Fai Wong, Chung-Kwong Yuen
2003 CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.  
Extending Cilk's theoretical performance model, we show that with the RC dag consistent DSM, the expected execution time ¢ ¤ £ of a partially strict multithreaded computation on  ...  SilkRoad is built on the Cilk system with an extended memory consistency model which we call RC dag consistency.  ...  We would like to thank Charles Leiserson and Matteo Frigo of MIT for their suggestions and discussions on the problems we presented in the paper.  ... 
doi:10.1109/ccgrid.2003.1199406 dblp:conf/ccgrid/PengWY03 fatcat:nqyifpqwojgjtahqwshqhdknau

Multithreaded Processors

T. Ungerer
2002 Computer journal  
Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  The chip multiprocessor integrates two or more complete processors on a single chip. Every unit of a processor is duplicated and used independently of its copies on the chip.  ...  Instruction recycling has been explored on a multipath SMT processor proposed in [78] .  ... 
doi:10.1093/comjnl/45.3.320 fatcat:hlkkabuhrzhkrmuyqomzfmc6zm

Multi-Threaded Processors [chapter]

David Padua, Amol Ghoting, John A. Gunnels, Mark S. Squillante, José Meseguer, James H. Cownie, Duncan Roweth, Sarita V. Adve, Hans J. Boehm, Sally A. McKee, Robert W. Wisniewski, George Karypis (+29 others)
2011 Encyclopedia of Parallel Computing  
Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  The chip multiprocessor integrates two or more complete processors on a single chip. Every unit of a processor is duplicated and used independently of its copies on the chip.  ...  Instruction recycling has been explored on a multipath SMT processor proposed in [78] .  ... 
doi:10.1007/978-0-387-09766-4_423 fatcat:heb3n2cfwnbi5nvxv5kvxd2xgm

The impact of incorrectly speculated memory operations in a multithreaded architecture

R. Sendag, Ying Chen, D.J. Lilja
2005 IEEE Transactions on Parallel and Distributed Systems  
In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance.  ...  We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads.  ...  The authors would like to thank the anonymous reviewers for their thoughtful comments on earlier versions of this paper.  ... 
doi:10.1109/tpds.2005.36 fatcat:prajemixvvhvjokg67ai2hgtsq

SeMIA: Self-Similarity-Based IC Integrity Analysis

Yu Zheng, Shuo Yang, Swarup Bhunia
2016 IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems  
The proposed approach, referred to as SeMIA, exploits intrinsic structural self-similarity in a design (e.g., multiple cores, multiple functional units of the same type, different parts of an adder) to  ...  Through extensive simulations, we show that for 15% inter-and 10% intra-die variations in threshold voltage for a 45nm CMOS process, over 98% of recycled chips can be reliably identified.  ...  Fig. 3 shows an example architecture of a multi-core superscalar processor with structural self-similarity at multiple levels, e.g., across cores, functional units (FUs), and sub-circuits of FUs (e.g.  ... 
doi:10.1109/tcad.2015.2449231 fatcat:o4urabcs7rbelo6wrfc2mryxpq

Software Logging under Speculative Parallelization [chapter]

María Jesús Garzarán, Milos Prvulovic, José María Llabería, Víctor Viñals, Lawrence Rauchwerger, Josep Torrellas
2004 High Performance Memory Systems  
Using simulations of a 16-processor CC-NUMA, we show that the execution time of such programs on a system with software logging is on average 36% shorter than on a system where caches can only hold a single  ...  Often, a cache may have to buffer the state of several tasks and, as a result, it may have to hold multiple versions of the same variable.  ...  They run on 16 processors.  ... 
doi:10.1007/978-1-4419-8987-1_12 fatcat:e6kgtugjvjdqvgjdht5ftoa7ai

Lightweight predication support for out of order processors

Mark Stephenson, Lixin Zhang, Ram Rangan
2009 2009 IEEE 15th International Symposium on High Performance Computer Architecture  
For instance, the only form of predication supported by modern OOO processors is a simple conditional move.  ...  In this paper, we introduce a generalized form of hammock predication -called predicated mutually exclusive groups -that requires few modifications to an existing processor pipeline, yet presents the compiler  ...  The unique identifiers are assigned to the guard instructions in the rename stage and recycled on commits and pipeline flushes.  ... 
doi:10.1109/hpca.2009.4798255 dblp:conf/hpca/StephensonZR09 fatcat:exrb6ivuknhyzitwidr666bt4e

Energy-Efficient Thread-Level Speculation

J. Renau, K. Strauss, L. Ceze, Wei Liu, S.R. Sarangi, J. Tuck, J. Torrellas
2006 IEEE Micro  
One such alternative is thread-level speculation (TLS) on a chip multiprocessor (CMP).  ...  Our TLS CMP design relies on an efficient microarchitecture with out-of-order task spawning and a novel TLS compiler.  ...  If multiple speculative tasks coexist in a single processor, a cache might have to hold multiple versions of the same datum.  ... 
doi:10.1109/mm.2006.11 fatcat:zd2zg2xfgrenboie4btmi52hhq
« Previous Showing results 1 — 15 out of 2,652 results