Filters








1,720 Hits in 3.3 sec

A comparison of scalable superscalar processors

Bradley C. Kuszmaul, Dana S. Henry, Gabriel H. Loh
1999 Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures - SPAA '99  
The poor scalability of existing superscalar processors has been of great concern to the computer engineering community.  ...  These networks provide the full functionality of superscalar processors including renaming, out-of-order execution, and speculative execution.  ...  Christopher Joerg of Compaq's Cambridge Research Laboratory pointed out the trend of increasing numbers of logical registers and argued that we should treat the number of logical registers as a scaling  ... 
doi:10.1145/305619.305633 dblp:conf/spaa/KuszmaulHL99 fatcat:yvylxfpvxjfjzhzz3lzdwxvgce

A Comparison of Asymptotically Scalable Superscalar Processors

B. C. Kuszmaul, D. S. Henry, G. H. Loh
2002 Theory of Computing Systems  
The poor scalability of existing superscalar processors has been of great concern to the computer engineering community.  ...  These networks provide the full functionality of superscalar processors including renaming, out-of-order execution, and speculative execution.  ...  Christopher Joerg of Compaq's Cambridge Research Laboratory pointed out the trend of increasing numbers of logical registers and argued that we should treat the number of logical registers as a scaling  ... 
doi:10.1007/s00224-001-1029-z fatcat:nfxub7wdlbb5hnclgj5tzgx2qm

Exploring branch target buffer access filtering for low-energy and high-performance microarchitectures

S. Wang, J. Hu, S.G. Ziavras
2012 IET Computers & Digital Techniques  
Powerful branch predictors along with a large branch target buffer (BTB) are employed in superscalar and simultaneous multi-threading (SMT) processors for instruction-level parallelism and thread-level  ...  For the simultaneous multi-threading environment, the authors evaluate the effectiveness of the BAF design and propose a banked BAF (BK-BAF) scheme to further reduce the energy consumption and performance  ...  at different drowsy intervals in the superscalar processor a Leakage reduction b Performance Fig. 6 6 Performance scalability of the BAF in the superscalar processor Fig. 7 7 Comparison of the dynamic  ... 
doi:10.1049/iet-cdt.2010.0102 fatcat:iwu5rr5jgnb5vitcywebjxkd3m

Multithreading decoupled architectures for complexity-effective general purpose computing

Michael Sung, Ronny Krashinsky, Krste Asanović
2001 SIGARCH Computer Architecture News  
It is argued that such a decoupled architecture is more complexity-effective and scalable than comparable superscalar processors, which incorporate enormous amounts of complexity for modest performance  ...  Decoupled architectures have not traditionally been used in the context of general purpose computing because of their inability to tolerate control-intensive code that exists across a wide range of applications  ...  Since a decoupled machine alleviates the need for centralized resources, it is inherently more scalable than corresponding superscalar processors.  ... 
doi:10.1145/563647.563658 fatcat:fjmdpove5ravhclvctbfurz6im

Scheduled dataflow: execution paradigm, architecture, and performance evaluation

K.M. Kavi, R. Giorgi, J. Arul
2001 IEEE transactions on computers  
architectures in order to have a fair comparison.  ...  Recent focus in the field of new processor architectures is mainly on VLIW (e.g., IA-64), superscalar, and superspeculative designs.  ...  In this paper, we present an architecture that can overcome this problem, with better scalability than superscalar processors with increased number of pipelines.  ... 
doi:10.1109/tc.2001.947011 fatcat:e7cco3kjqvcopmukezqkuzyahq

Scheduled dataflow: execution paradigm, architecture, and performance evaluation

K.M. Kavi, R. Giorgi, J. Arul
2001 IEEE transactions on computers  
architectures in order to have a fair comparison.  ...  Recent focus in the field of new processor architectures is mainly on VLIW (e.g., IA-64), superscalar, and superspeculative designs.  ...  In this paper, we present an architecture that can overcome this problem, with better scalability than superscalar processors with increased number of pipelines.  ... 
doi:10.1109/12.947003 fatcat:inhwbcvzrnhplobil2togefjg4

Hybrid multi-core architecture for boosting single-threaded performance

Jun Yan, Wei Zhang
2007 SIGARCH Computer Architecture News  
In this paper, we propose a compiler-driven heterogeneous multicore architecture, consisting of tightly-integrated VLIW (Very Long Instruction Word) and superscalar processors on a single chip, to automatically  ...  While multithreaded applications can naturally leverage the enhanced throughput of multi-core processors, a large number of important applications are single-threaded, which cannot automatically harness  ...  While this paper concentrates on studying a VLIW/superscalar dual-core, we also intend to investigate the scalability of hybrid multi-cores with different number of VLIW and superscalar processors.  ... 
doi:10.1145/1241601.1241603 fatcat:vjzotxsbo5dtvc6oe6wcifxcie

A new direction for computer architecture research

C.E. Kozyrakis, D.A. Patterson
1998 Computer  
BILLION-TRANSISTOR PROCESSORS Computer recently produced a special issue on "Billion-Transistor Architectures." 1 The first three articles discussed problems and trends that will affect future processor  ...  These devices will pose a different set of requirements for microprocessors and could redirect the emphasis of computer architecture research.  ...  The simultaneous multithreading (SMT) processor uses multithreading at the granularity of instruction issue slot to maximize the use of a wide-issue, out-of-order superscalar processor.  ... 
doi:10.1109/2.730733 fatcat:ykv5f53p5rfdfo4a72a4i25g2q

Microarchitecture of a Coarse-Grain Out-of-Order Superscalar Processor

Davor Capalija, Tarek S. Abdelrahman
2013 IEEE Transactions on Parallel and Distributed Systems  
We explore the design, implementation and evaluation of a coarse-grain superscalar processor in the context of the microarchitecture of the Control Processor (CP) of the Multi-Level Computing Architecture  ...  It does so in a fashion similar to how instruction-level parallelism is extracted by superscalar processors, i.e., using register renaming, out-of-order execution and scheduling.  ...  In comparison, a 4-way superscalar processor must rename 12 registers per cycle and thus requires a complex matching logic and a CAM-based 12-ported renaming table [19] .  ... 
doi:10.1109/tpds.2012.135 fatcat:snvi5xnshvawbj3rsz4lsb3pra

Enabling HMMER for the Grid with COMP Superscalar

Enric Tejedor, Rosa M. Badia, Romina Royo, Josep L. Gelpí
2010 Procedia Computer Science  
In particular, we present a sequential version of the HMMER hmmpfam tool that, when run with COMP Superscalar, is decomposed into tasks and run on a set of distributed resources, not burdening the programmer  ...  Although performance is not a main objective of this work, we also present some test results where COMP Superscalar, using a new pre-scheduling technique, clearly outperforms a well-known parallelization  ...  Acknowledgment The authors gratefully acknowledge the financial support of the Comisión Interministerial de Ciencia y Tecnología (CICYT, Contract TIN2007-60625), the Generalitat de Catalunya (2009-SGR-  ... 
doi:10.1016/j.procs.2010.04.296 fatcat:2vudddibevfbrkftpg46wuilpq

On the Scalability of 1- and 2-Dimensional SIMD Extensions for Multimedia Applications

F. Sanchez, M. Alvarez, E. Salami, A. Ramirez, M. Valero
2005 IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.  
In this paper we perform a scalability analysis of SIMD extensions for multimedia applications.  ...  Speed-ups over a 2-way superscalar processor with MMX-like extension go up to 4X for kernels and up to 3.3X for complete applications and the matrix architecture can deliver, in some cases, more performance  ...  European Network of Excellence, and by IBM.  ... 
doi:10.1109/ispass.2005.1430571 dblp:conf/ispass/SanchezASRV05 fatcat:fnr3t7uibbfgbd2rn4tleat3s4

The Coming Wave of Multithreaded Chip Multiprocessors

James Laudon, Lawrence Spracklen
2007 International journal of parallel programming  
Finally, we present performance comparisons between Sun's Niagara and more conventional dual-core processors built from large superscalar processor cores.  ...  We examine two multi-threaded CMPs built using a large number of processor cores: Sun's Niagara and Niagara 2 processors. We also explore the programming issues for CMPs with large number of threads.  ...  Fig. 11 shows a comparison of SPECjbb 2005 results between the Niagara-based SunFire T2000 and three IBM systems based on CMPs using more conventional superscalar POWER or x86 cores: the IBM p550, IBM  ... 
doi:10.1007/s10766-007-0033-6 fatcat:4gzhbtdumvablcjfy62osfb2g4

Scalable vector processors for embedded systems

C.E. Kozyrakis, D.A. Patterson
2003 IEEE Micro  
Acknowledgments We thank all the members of the IRAM research group at the University of California at Berkeley.  ...  Figure 2 . 2 TM1300 outperforms VIRAM-4L only for JPEG, for Performance-per-MHz comparison, normalized to the performance of the MPC7455 superscalar processor.  ...  A superscalar processor, on the other hand, can extract a much smaller amount of ILP from its sequential instruction streams.  ... 
doi:10.1109/mm.2003.1261385 fatcat:arrxeb4uk5ek3ohjheugjmxyji

Available Task-Level Parallelism on the Cell BE

Alejandro Rico, Alex Ramirez, Mateo Valero
2009 Scientific Programming  
In this paper we analyze the performance of Cell Superscalar, a task-based programming model for the Cell Broadband Engine Architecture, in terms of its scalability to higher number of on-chip processors  ...  Our results show that the low performance of the PPE component limits the scalability of some applications to less than 16 processors.  ...  This work has been supported, in part, by the Spanish Ministry of Science and Education, scholarship AP2005-4245 and contract CICYT TIN2007-60625; by the Scalable Architecture (SARC) project FP6/FET-27648  ... 
doi:10.1155/2009/741282 fatcat:lkqbeei4ovfchbowk7et2kdp2q

Achieving Superscalar Performance without Superscalar Overheads - A Dataflow Compiler IR for Custom Computing

Ali Mustafa Zaidi, David J. Greaves, Marc Herbstritt
2013 Imperial College Computing Student Workshop  
Our custom hardware is able to approach the sequential cycle-counts of an Intel Nehalem Core i7 superscalar processor, while consuming on average only 0.25× the energy of an in-order Altera Nios IIf processor  ...  Unfortunately, while it efficiently accelerates numeric, data-parallel applications, custom hardware often exhibits poor performance on sequential code, so complex, power-hungry superscalar processors  ...  Comparison (Cycle Count) vs an out-of-order Intel Nehalem Core i7 processor, and an Alteral Nios IIf in-order processor.  ... 
doi:10.4230/oasics.iccsw.2013.136 dblp:conf/iccsw/ZaidiG13 fatcat:5um2rvefbzf6do4hkrr6fgkbka
« Previous Showing results 1 — 15 out of 1,720 results