Filters








119 Hits in 5.1 sec

A survey of processors with explicit multithreading

Theo Ungerer, Borut Robič, Jurij Šilc
2003 ACM Computing Surveys  
Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  A multithreaded processor is able to pursue two or more threads of control in parallel within the processor pipeline.  ...  A way to look at latencies that arise in a pipelined execution is the opportunity cost in terms of the instructions that might be processed while the pipeline is interlocked, for example, waiting for a  ... 
doi:10.1145/641865.641867 fatcat:u6x7jdmkfvexnm3culskjsoxwi

Multi-Threaded Processors [chapter]

David Padua, Amol Ghoting, John A. Gunnels, Mark S. Squillante, José Meseguer, James H. Cownie, Duncan Roweth, Sarita V. Adve, Hans J. Boehm, Sally A. McKee, Robert W. Wisniewski, George Karypis (+29 others)
2011 Encyclopedia of Parallel Computing  
Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  In contrast, the multithreaded processor is able to pursue two or more threads of control in parallel within the processor pipeline.  ...  Another way to look at latencies that arise in a pipelined execution is the opportunity cost in terms of the instructions that might be processed while the pipeline is interlocked, for example, waiting  ... 
doi:10.1007/978-0-387-09766-4_423 fatcat:heb3n2cfwnbi5nvxv5kvxd2xgm

Multithreaded Processors

T. Ungerer
2002 Computer journal  
Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  In contrast, the multithreaded processor is able to pursue two or more threads of control in parallel within the processor pipeline.  ...  Another way to look at latencies that arise in a pipelined execution is the opportunity cost in terms of the instructions that might be processed while the pipeline is interlocked, for example, waiting  ... 
doi:10.1093/comjnl/45.3.320 fatcat:hlkkabuhrzhkrmuyqomzfmc6zm

Power-aware microarchitecture: design and modeling challenges for next-generation microprocessors

P.W. Cook, M. Gupta, V. Zyuban, J. Wellman, A. Buyuktosunoglu, P.N. Kudva, H. Jacobson, S.E. Schuster, P. Bose, D.M. Brooks
2000 IEEE Micro  
In an aggressive superscalar processor that doesn't employ any form of clock gating, a design's clock tree, drivers, and clocked latches (pipeline registers, buffers, queues) can consume up to 70% of the  ...  Such memories are often used to implement register files with renaming support and issue queue structures that require support for operand and instruction bus structures (instruction dispatch bus or result  ...  This will impose many more difficult challenges in identifying energy-saving opportunities in future designs.  ... 
doi:10.1109/40.888701 fatcat:ppinuavlsjf2bouizu2yhbmonm

Superspeculative microarchitecture for beyond AD 2000

M.H. Lipasti, J.P. Shen
1997 Computer  
recompilation or changes to the instruction set architecture.  ...  The experimental, superspeculative microarchitecture Superflow has a potential performance of 9.0 instructions per cycle and realizable performance of 7.3 IPC for the SPEC95 integer suite, without requiring  ...  SMT uniprocessors Simultaneous multithreaded processors are superscalar uniprocessors that support multiple machine contexts and execute multiple instruction streams simultaneously.  ... 
doi:10.1109/2.612250 fatcat:ezukhsogtvcnjfga5hqpj5jcsi

The Garp architecture and C compiler

T.J. Callahan, J.R. Hauser, J. Wawrzynek
2000 Computer  
In many cases, these irregularities reduced the amount of parallelism and prevented the use of pipelining or memory queues.  ...  Superscalar Superscalar processors can exploit parallelism in code that has been compiled for a sequential processor, and they can adjust their execution dynamically for operations with variable latency  ... 
doi:10.1109/2.839323 fatcat:otkluciqkva6vo6ejbjzgjjoya

VLIW compilation techniques for superscalar architectures [chapter]

Esther Stümpel, Michael Thies, Uwe Kastens
1998 Lecture Notes in Computer Science  
A b s t r a c t Efficient use of multiple functional units in superscalar processors requires instruction level parallelism to be detected and exploited.  ...  In our approach we reuse an existing retargetable VLIW compiler environment by instantiating it for a VLIW processor whose resources and instruction timings resemble those of the PowerPC.  ...  in superscalar processors.  ... 
doi:10.1007/bfb0026435 fatcat:kzi2rvw44rcuxeroxiyc4fecqq

An elementary processor architecture with simultaneous instruction issuing from multiple threads

Hiroaki Hirata, Kozo Kimura, Satoshi Nagamine, Yoshiyuki Mochizuki, Akio Nishimura, Yoshimori Nakase, Teiji Nishizawa
1992 Proceedings of the 19th annual international symposium on Computer architecture - ISCA '92  
In our processor architecture, instructions from different threads (not a single thread) are issued simultaneously to multiple functional units, and these instructions can begin execution unless there  ...  Another loop execution scheme, by using the multiple control flow mechanism of our architecture, makes it possible to parallelize loops which are difficult to parallelize in vector or VLIW machines.  ...  Multithreading with Superscalar Design In our simulator, each thread slot can support multiple instruction issuing from a single instruction stream similar to superscalar architectures.  ... 
doi:10.1145/139669.139710 dblp:conf/isca/HirataKNMNNN92 fatcat:2mtvetydrjberag77rtikjl2xa

An elementary processor architecture with simultaneous instruction issuing from multiple threads

Hiroaki Hirata, Kozo Kimura, Satoshi Nagamine, Yoshiyuki Mochizuki, Akio Nishimura, Yoshimori Nakase, Teiji Nishizawa
1992 SIGARCH Computer Architecture News  
In our processor architecture, instructions from different threads (not a single thread) are issued simultaneously to multiple functional units, and these instructions can begin execution unless there  ...  Another loop execution scheme, by using the multiple control flow mechanism of our architecture, makes it possible to parallelize loops which are difficult to parallelize in vector or VLIW machines.  ...  Multithreading with Superscalar Design In our simulator, each thread slot can support multiple instruction issuing from a single instruction stream similar to superscalar architectures.  ... 
doi:10.1145/146628.139710 fatcat:xklw4rswkbczjk63mmwkynn5vi

Multithreading with distributed functional units

B.K. Gunther
1997 IEEE transactions on computers  
The multiple pipeline approach is studied specifically in the Concurro processor architecture-a machine supporting multiple thread contexts and capable of context switching asynchronously in response to  ...  With suitable prefetching, multiple instruction caches can be avoided, and multithreading is shown to obviate the need for sophisticated instruction dispatch mechanisms on parallel workloads.  ...  Variations on superscalar [26] , [27] VLIW (very long instruction word) [18] , [29] , and MIMD (multiple instruction/multiple data) [15] techniques used in conjunction with multithreading or multistreaming  ... 
doi:10.1109/12.588034 fatcat:bb67gixdrvgmjdeaxnnyjyhb6a

A survey of new research directions in microprocessors

J. Šilc, T. Ungerer, B. Robic
2000 Microprocessors and microsystems  
Current microprocessors utilise the instruction-level parallelism by a deep processor pipeline and the superscalar instruction issue technique.  ...  VLSI technology offers several solutions for aggressive exploitation of the instruction-level parallelism in future generations of microprocessors.  ...  Deeper pipelining often results in dependence checking and dispatch in multiple pipelined stages.  ... 
doi:10.1016/s0141-9331(00)00072-7 fatcat:55y6n4wzijaeppl3l5qp6x2koa

Instruction Level Parallelism through Microthreading—A Scalable Approach to Chip Multiprocessors

Kostas Bousias, Nabil Hasasneh, Chris Jesshope
2005 Computer journal  
This mechanism allows superscalar processors to extract reasonably high levels of instruction level parallelism (ILP).  ...  It supports distributed instruction issue and a fully scalable register file, which implements a distributed, shared-register model of communication and synchronization between multiple processors on a  ...  However, in practice, many modern VLIW architectures end up requiring many of the same complex mechanisms as superscalar processors [16] , such as branch prediction, speculative loads, pipeline interlocks  ... 
doi:10.1093/comjnl/bxh157 fatcat:d73gokblevbtjcp6rwsz3y5s3q

Exploiting compiler-generated schedules for energy savings in high-performance processors

Madhavi Valluri, Lizy John, Heather Hanson
2003 Proceedings of the 2003 international symposium on Low power electronics and design - ISLPED '03  
This paper develops a technique that uniquely combines the advantages of static scheduling and dynamic scheduling to reduce the energy consumed in modern superscalar processors with out-of-order issue  ...  In this Hybrid-Scheduling paradigm, regions of the application containing large amounts of parallelism visible at compile-time completely bypass the dynamic scheduling logic and execute in a low power  ...  hardware (or out-oforder issue logic) responsible for identifying multiple instructions to issue in parallel.  ... 
doi:10.1145/871506.871608 dblp:conf/islped/ValluriJH03 fatcat:h5232vnpbfgvpioeprisbmdxmi

Exploiting compiler-generated schedules for energy savings in high-performance processors

Madhavi Valluri, Lizy John, Heather Hanson
2003 Proceedings of the 2003 international symposium on Low power electronics and design - ISLPED '03  
This paper develops a technique that uniquely combines the advantages of static scheduling and dynamic scheduling to reduce the energy consumed in modern superscalar processors with out-of-order issue  ...  In this Hybrid-Scheduling paradigm, regions of the application containing large amounts of parallelism visible at compile-time completely bypass the dynamic scheduling logic and execute in a low power  ...  hardware (or out-oforder issue logic) responsible for identifying multiple instructions to issue in parallel.  ... 
doi:10.1145/871604.871608 fatcat:ieffom4ytbc5ngifdifg3klzvi

Zero-cycle loads: microarchitecture support for reducing load latency

T.M. Austin, G.S. Sohi
1995 Proceedings of the 28th Annual International Symposium on Microarchitecture  
We present two pipeline designs supporting zero-cycle loads: one for pipelines with a single stage of instruction decode, and another for pipelines with multiple decode stages.  ...  Programs executing on processors with support for zero-cycle loads experience significantly fewer pipeline stalls due to load instructions and increased overall performance.  ...  This work was supported in part by NSF Grants CCR-9303030 and MIP-9505853, ONR Grant N00014-93-1-0465, and a donation from Intel Corp.  ... 
doi:10.1109/micro.1995.476815 dblp:conf/micro/AustinS95 fatcat:u4duppwmuje2jl2sgthdoyp6sy
« Previous Showing results 1 — 15 out of 119 results