Filters








118 Hits in 3.5 sec

Memory dependence speculation tradeoffs in centralized, continuous-window superscalar processors

A. Moshovos, G.S. Sohi
Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550)  
We focus on centralized, continuous-window processor models (the common configuration today). We confirm that exploiting load/ store parallelism can greatly improve performance.  ...  We conclude by discussing why our findings differ, in part, from those reported for split, distributed window processor models.  ...  In this work we study memory dependence speculation under a continuous, centralized window processor model (continuous window). (In Section 2.2, we clarify the differences between these two models.)  ... 
doi:10.1109/hpca.2000.824359 dblp:conf/hpca/MoshovosS00 fatcat:crjrih7mizhrhashtygzwiuppe

Parallelism exploitation in superscalar multiprocessing

N.-P. Lu, C.-P. Chung
1998 IEE Proceedings - Computers and digital Techniques  
This simulator models both a superscalar processor that can exploit instruction-level parallelism, and a shared-memory multiprocessor system that can exploit task-level parallelism.  ...  To exploit more parallelism in programs, superscalar multiprocessor systems, which exploit both fine-grained and coarsegrained parallelism, have been the trend in designing high-speed computing systems  ...  Currently, the superscalar processor core is well tested, and we are attempting to modify the task scheduler to support different memory consistency models and implement speculative memory access.  ... 
doi:10.1049/ip-cdt:19981955 fatcat:ih24325o5jcijevm5hwntouove

Decisive aspects in the evolution of microprocessors

D. Sima
2004 Proceedings of the IEEE  
parts or subsystems. 2 In order to avoid a large number of multiple references to superscalar processors in the text and in the figures, we give all references to superscalars only in Fig. 28 .  ...  The incessant market demand for higher and higher processor performance called for a continuous increase of clock frequencies as well as an impressive evolution of the microarchitecture.  ...  As a tradeoff, second-generation superscalars allow branch speculation usually along a different number of unresolved conditional branches, e.g., along two conditional branches like in the Power2, or four  ... 
doi:10.1109/jproc.2004.837627 fatcat:lj6qx4lbojbzjgn4meo5n7f72m

PSATSim

Clint W. Smullen, Tarek M. Taha
2006 Proceedings of the 2006 workshop on Computer architecture education held in conjunction with the 33rd International Symposium on Computer Architecture - WCAE '06  
It is important for students in computer organization classes to understand the tradeoff between these two issues.  ...  This paper presents PSATSim, a graphical simulator that allows student to configure the design of a speculative out-of-order execution superscalar processor and see the effect of the design on both power  ...  Speculative execution allows the processor to speculate on branch outcomes and continue executing instructions, without disrupting the final state of the processor.  ... 
doi:10.1145/1275620.1275627 dblp:conf/wcae/SmullenT06 fatcat:k7oso2atxnfw7o5udee3luh674

A survey of processors with explicit multithreading

Theo Ungerer, Borut Robič, Jurij Šilc
2003 ACM Computing Surveys  
A multithreaded processor is able to pursue two or more threads of control in parallel within the processor pipeline.  ...  Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  Four byte portions are fetched over the memory interface and put in the appropriate instruction window (IW).  ... 
doi:10.1145/641865.641867 fatcat:u6x7jdmkfvexnm3culskjsoxwi

Efficiency of thread-level speculation in SMT and CMP architectures - performance, power and thermal perspective

Venkatesan Packirisamy, Yangchun Luo, Wei-Lung Hung, Antonia Zhai, Pen-Chung Yew, Tin-Fook Ngai
2008 2008 IEEE International Conference on Computer Design  
In terms of Energy-Delay-Squared product (ED 2 ), SMT based TLS performs about 26% better than the best CMP based TLS configuration and 11% better than the superscalar architecture.  ...  In this paper, we present a detailed study of the performance, power consumption and thermal effect of these multithreaded architectures against that of a Superscalar with equal chip area.  ...  Superscalar configuration Our base configuration is a SimpleScalar-based Superscalar architecture. The architectural parameter of this processor can be found in Table I .  ... 
doi:10.1109/iccd.2008.4751875 dblp:conf/iccd/PackirisamyLHZYN08 fatcat:65vh22ou2fcwrcvpye3cu6wwce

HeDGE: Hybrid Dataflow Graph Execution in the Issue Logic [chapter]

Suriya Subramanian, Kathryn S. McKinley
2009 Lecture Notes in Computer Science  
Exposing more instruction-level parallelism in out-of-order superscalar processors requires increasing the number of dynamic in-flight instructions.  ...  HeDGE explicitly maintains dependences between instructions in the issue window by modifying the issue, register renaming, and wakeup logic.  ...  in superscalar processors.  ... 
doi:10.1007/978-3-540-92990-1_23 fatcat:uy6xqh5365altnbbojwctud2sy

Microarchitectural innovations: boosting microprocessor performance beyond semiconductor technology scaling

A. Moshovos, G.S. Sohi
2001 Proceedings of the IEEE  
The second category includes memory hierarchies, branch predictors, trace caches, and memory-dependence predictors.  ...  The first category includes pipelining, superscalar execution, out-of-order execution, register renaming, and techniques to overlap memory-accessing instructions.  ...  (a) Continuous, centralized instruction window (e.g., typical dynamically scheduled superscalar). (b) Multilscalar's way of building a large instruction window. (c) A loop.  ... 
doi:10.1109/5.964438 fatcat:ewlfjz4cyzgdfbrw54ubn4pfu4

Hybrid Dataflow/von-Neumann Architectures

Fahimeh Yazdanpanah, Carlos Alvarez-Martinez, Daniel Jimenez-Gonzalez, Yoav Etsion
2014 IEEE Transactions on Parallel and Distributed Systems  
Finally, we compare a representative set of recent general purpose hybrid dataflow/von-Neumann architectures, discuss their different approaches, and explore the evolution of these hybrid processors.  ...  Although different implementations differ in the way they merge the conceptually different computational models, they all follow similar principles: they harness the parallelism and data synchronization  ...  The processor relies on hardware mechanisms that determine dynamically data dependencies among the instructions in the instruction window.  ... 
doi:10.1109/tpds.2013.125 fatcat:rswr6zvvjjamxjjca6p2cfhh5y

Disjoint out-of-order execution processor

Mageda Sharafeddine, Komal Jothi, Haitham Akkary
2012 ACM Transactions on Architecture and Code Optimization (TACO)  
High-performance superscalar architectures used to exploit instruction level parallelism in single-thread applications have become too complex and power hungry for the multicore processors era.  ...  Our architecture combines speculative multithreading (SpMT) with checkpoint recovery and continual flow pipeline architectures.  ...  superscalar processors.  ... 
doi:10.1145/2355585.2355592 fatcat:3mrp3fyihfgtnitoli35mhmtdy

Optimal architectures and algorithms for mesh-connected parallel computers with separable row/column buses

M.J. Serrano, B. Parhami
1993 IEEE Transactions on Parallel and Distributed Systems  
Performance Tradeoffs in Multistreamed Superscalar Architectures by Mauricio Jose Serrano Superscalar processors employ multiple functional unit designs that can dispatch several instructions every cycle  ...  Variants in the design are possible because of tradeoffs that can be done in the design. We explore several problems present in multistreamed architectures and discuss possible solutions.  ...  We present two studies of tradeoffs in a multistreamed superscalar processor.  ... 
doi:10.1109/71.246069 fatcat:yspb36qsvzdujaw2awrky4brwq

Exploiting short-lived variables in superscalar processors

L.A. Lozano, G.R. Gao
1995 Proceedings of the 28th Annual International Symposium on Microarchitecture  
The importance of this tradeoff can be better exemp1ified by the Iwo current tendencies in superscalar processor implementations.  ...  Another important issue for superscalar processors is to be able to deal with dependencies between instructions that access memory.  ... 
doi:10.1109/micro.1995.476839 dblp:conf/micro/LozanoG95 fatcat:jjqt7xr3mfb7lic5oe2eetem3q

Instruction-Level Parallel Processing

J. A. FISHER, R. RAU
1991 Science  
To continue this performance growth, microprocessor designers have incorporated Instruction-level Parallelism (ILP) into new designs.  ...  This is the cumulative result of architectural improvements as well as increases in circuit speed.  ...  In the case of superscalar processors, the hardware can only consider some number of nearby operations (called the instruction window) at one time.  ... 
doi:10.1126/science.253.5025.1233 pmid:17831442 fatcat:uznnxs5q3nay5fk6zwqla2wqey

An investigation of the performance of various instruction-issue buffer topologies

S. Jourdan, P. Sainrat, D. Litaize
1995 Proceedings of the 28th Annual International Symposium on Microarchitecture  
The importance of this tradeoff can be better exemp1ified by the Iwo current tendencies in superscalar processor implementations.  ...  Another important issue for superscalar processors is to be able to deal with dependencies between instructions that access memory.  ... 
doi:10.1109/micro.1995.476837 dblp:conf/micro/JourdanSL95 fatcat:wuhuhegebrcnjfzysttify635q

Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency

Kunle Olukotun, Lance Hammond, James Laudon
2007 Synthesis Lectures on Computer Architecture  
Further advances in both superscalar issue and pipelining are also limited by the fact that they require ever-larger number of transistors to be integrated into the high-speed central logic within each  ...  On conventional systems, the only way to take advantage of this kind of parallelism is to have a superscalar processor with an instruction window large enough to find parallelism among the individual instructions  ...  As was mentioned previously, in a very large loop body a single memory dependence violation near the end of the loop can result in a large amount of work being discarded.  ... 
doi:10.2200/s00093ed1v01y200707cac003 fatcat:qyjilavdhfcmlnc46l5sxg7ssq
« Previous Showing results 1 — 15 out of 118 results