A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2004; you can also visit the original URL.
The file type is application/pdf
.
Filters
A comparison of scalable superscalar processors
1999
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures - SPAA '99
The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. ...
These networks provide the full functionality of superscalar processors including renaming, out-of-order execution, and speculative execution. ...
Christopher Joerg of Compaq's Cambridge Research Laboratory pointed out the trend of increasing numbers of logical registers and argued that we should treat the number of logical registers as a scaling ...
doi:10.1145/305619.305633
dblp:conf/spaa/KuszmaulHL99
fatcat:yvylxfpvxjfjzhzz3lzdwxvgce
A Comparison of Asymptotically Scalable Superscalar Processors
2002
Theory of Computing Systems
The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. ...
These networks provide the full functionality of superscalar processors including renaming, out-of-order execution, and speculative execution. ...
Christopher Joerg of Compaq's Cambridge Research Laboratory pointed out the trend of increasing numbers of logical registers and argued that we should treat the number of logical registers as a scaling ...
doi:10.1007/s00224-001-1029-z
fatcat:nfxub7wdlbb5hnclgj5tzgx2qm
Exploring branch target buffer access filtering for low-energy and high-performance microarchitectures
2012
IET Computers & Digital Techniques
Powerful branch predictors along with a large branch target buffer (BTB) are employed in superscalar and simultaneous multi-threading (SMT) processors for instruction-level parallelism and thread-level ...
For the simultaneous multi-threading environment, the authors evaluate the effectiveness of the BAF design and propose a banked BAF (BK-BAF) scheme to further reduce the energy consumption and performance ...
at different drowsy intervals in the superscalar processor a Leakage reduction b Performance
Fig. 6 6 Performance scalability of the BAF in the superscalar processor
Fig. 7 7 Comparison of the dynamic ...
doi:10.1049/iet-cdt.2010.0102
fatcat:iwu5rr5jgnb5vitcywebjxkd3m
Multithreading decoupled architectures for complexity-effective general purpose computing
2001
SIGARCH Computer Architecture News
It is argued that such a decoupled architecture is more complexity-effective and scalable than comparable superscalar processors, which incorporate enormous amounts of complexity for modest performance ...
Decoupled architectures have not traditionally been used in the context of general purpose computing because of their inability to tolerate control-intensive code that exists across a wide range of applications ...
Since a decoupled machine alleviates the need for centralized resources, it is inherently more scalable than corresponding superscalar processors. ...
doi:10.1145/563647.563658
fatcat:fjmdpove5ravhclvctbfurz6im
Scheduled dataflow: execution paradigm, architecture, and performance evaluation
2001
IEEE transactions on computers
architectures in order to have a fair comparison. ...
Recent focus in the field of new processor architectures is mainly on VLIW (e.g., IA-64), superscalar, and superspeculative designs. ...
In this paper, we present an architecture that can overcome this problem, with better scalability than superscalar processors with increased number of pipelines. ...
doi:10.1109/tc.2001.947011
fatcat:e7cco3kjqvcopmukezqkuzyahq
Scheduled dataflow: execution paradigm, architecture, and performance evaluation
2001
IEEE transactions on computers
architectures in order to have a fair comparison. ...
Recent focus in the field of new processor architectures is mainly on VLIW (e.g., IA-64), superscalar, and superspeculative designs. ...
In this paper, we present an architecture that can overcome this problem, with better scalability than superscalar processors with increased number of pipelines. ...
doi:10.1109/12.947003
fatcat:inhwbcvzrnhplobil2togefjg4
Hybrid multi-core architecture for boosting single-threaded performance
2007
SIGARCH Computer Architecture News
In this paper, we propose a compiler-driven heterogeneous multicore architecture, consisting of tightly-integrated VLIW (Very Long Instruction Word) and superscalar processors on a single chip, to automatically ...
While multithreaded applications can naturally leverage the enhanced throughput of multi-core processors, a large number of important applications are single-threaded, which cannot automatically harness ...
While this paper concentrates on studying a VLIW/superscalar dual-core, we also intend to investigate the scalability of hybrid multi-cores with different number of VLIW and superscalar processors. ...
doi:10.1145/1241601.1241603
fatcat:vjzotxsbo5dtvc6oe6wcifxcie
A new direction for computer architecture research
1998
Computer
BILLION-TRANSISTOR PROCESSORS Computer recently produced a special issue on "Billion-Transistor Architectures." 1 The first three articles discussed problems and trends that will affect future processor ...
These devices will pose a different set of requirements for microprocessors and could redirect the emphasis of computer architecture research. ...
The simultaneous multithreading (SMT) processor uses multithreading at the granularity of instruction issue slot to maximize the use of a wide-issue, out-of-order superscalar processor. ...
doi:10.1109/2.730733
fatcat:ykv5f53p5rfdfo4a72a4i25g2q
Microarchitecture of a Coarse-Grain Out-of-Order Superscalar Processor
2013
IEEE Transactions on Parallel and Distributed Systems
We explore the design, implementation and evaluation of a coarse-grain superscalar processor in the context of the microarchitecture of the Control Processor (CP) of the Multi-Level Computing Architecture ...
It does so in a fashion similar to how instruction-level parallelism is extracted by superscalar processors, i.e., using register renaming, out-of-order execution and scheduling. ...
In comparison, a 4-way superscalar processor must rename 12 registers per cycle and thus requires a complex matching logic and a CAM-based 12-ported renaming table [19] . ...
doi:10.1109/tpds.2012.135
fatcat:snvi5xnshvawbj3rsz4lsb3pra
Enabling HMMER for the Grid with COMP Superscalar
2010
Procedia Computer Science
In particular, we present a sequential version of the HMMER hmmpfam tool that, when run with COMP Superscalar, is decomposed into tasks and run on a set of distributed resources, not burdening the programmer ...
Although performance is not a main objective of this work, we also present some test results where COMP Superscalar, using a new pre-scheduling technique, clearly outperforms a well-known parallelization ...
Acknowledgment The authors gratefully acknowledge the financial support of the Comisión Interministerial de Ciencia y Tecnología (CICYT, Contract TIN2007-60625), the Generalitat de Catalunya (2009-SGR- ...
doi:10.1016/j.procs.2010.04.296
fatcat:2vudddibevfbrkftpg46wuilpq
On the Scalability of 1- and 2-Dimensional SIMD Extensions for Multimedia Applications
2005
IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.
In this paper we perform a scalability analysis of SIMD extensions for multimedia applications. ...
Speed-ups over a 2-way superscalar processor with MMX-like extension go up to 4X for kernels and up to 3.3X for complete applications and the matrix architecture can deliver, in some cases, more performance ...
European Network of Excellence, and by IBM. ...
doi:10.1109/ispass.2005.1430571
dblp:conf/ispass/SanchezASRV05
fatcat:fnr3t7uibbfgbd2rn4tleat3s4
The Coming Wave of Multithreaded Chip Multiprocessors
2007
International journal of parallel programming
Finally, we present performance comparisons between Sun's Niagara and more conventional dual-core processors built from large superscalar processor cores. ...
We examine two multi-threaded CMPs built using a large number of processor cores: Sun's Niagara and Niagara 2 processors. We also explore the programming issues for CMPs with large number of threads. ...
Fig. 11 shows a comparison of SPECjbb 2005 results between the Niagara-based SunFire T2000 and three IBM systems based on CMPs using more conventional superscalar POWER or x86 cores: the IBM p550, IBM ...
doi:10.1007/s10766-007-0033-6
fatcat:4gzhbtdumvablcjfy62osfb2g4
Scalable vector processors for embedded systems
2003
IEEE Micro
Acknowledgments We thank all the members of the IRAM research group at the University of California at Berkeley. ...
Figure 2 . 2 TM1300 outperforms VIRAM-4L only for JPEG, for Performance-per-MHz comparison, normalized to the performance of the MPC7455 superscalar processor. ...
A superscalar processor, on the other hand, can extract a much smaller amount of ILP from its sequential instruction streams. ...
doi:10.1109/mm.2003.1261385
fatcat:arrxeb4uk5ek3ohjheugjmxyji
Available Task-Level Parallelism on the Cell BE
2009
Scientific Programming
In this paper we analyze the performance of Cell Superscalar, a task-based programming model for the Cell Broadband Engine Architecture, in terms of its scalability to higher number of on-chip processors ...
Our results show that the low performance of the PPE component limits the scalability of some applications to less than 16 processors. ...
This work has been supported, in part, by the Spanish Ministry of Science and Education, scholarship AP2005-4245 and contract CICYT TIN2007-60625; by the Scalable Architecture (SARC) project FP6/FET-27648 ...
doi:10.1155/2009/741282
fatcat:lkqbeei4ovfchbowk7et2kdp2q
Achieving Superscalar Performance without Superscalar Overheads - A Dataflow Compiler IR for Custom Computing
2013
Imperial College Computing Student Workshop
Our custom hardware is able to approach the sequential cycle-counts of an Intel Nehalem Core i7 superscalar processor, while consuming on average only 0.25× the energy of an in-order Altera Nios IIf processor ...
Unfortunately, while it efficiently accelerates numeric, data-parallel applications, custom hardware often exhibits poor performance on sequential code, so complex, power-hungry superscalar processors ...
Comparison (Cycle Count) vs an out-of-order Intel Nehalem Core i7 processor, and an Alteral Nios IIf in-order processor. ...
doi:10.4230/oasics.iccsw.2013.136
dblp:conf/iccsw/ZaidiG13
fatcat:5um2rvefbzf6do4hkrr6fgkbka
« Previous
Showing results 1 — 15 out of 1,720 results