A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Baring it all to software: Raw machines
1997
Computer
A Baring It All to Software: Raw Machines This innovative approach eliminates the traditional instruction set interface and instead exposes the details of a simple replicated architecture directly to the ...
Acknowledgments We thank Doug Burger for his incisive technical feedback and his help in condensing this article. ...
This project is funded by US Defense Advanced Research Projects Agency contract DABT63-96-C-0036 and a National Science Foundation Presidential Young Investigator Award. ...
doi:10.1109/2.612254
fatcat:56kdywilqzfiralt33jgiq4aeq
Compilers for instruction-level parallelism
1997
Computer
ILP in a software-centric approach employs a very long instruction word (VLIW) processor and relies on a compiler to statically parallelize and schedule code. ...
In a hardware-centric implementation, ILP on a superscalar processor executes a sequential instruction stream. ...
Superscalar processors The most visible ILP processors are general-purpose processors, for which superscalar technology is the current design of choice. ...
doi:10.1109/2.642817
fatcat:sqa3irdg3zcqzftmok3rpsv65a
Architectural differences of efficient sequential and parallel computers
2002
Journal of systems architecture
For that purpose we analytically evaluate the performance of eight general purpose processor architectures representing widely both commercial and scientific processor designs in both single processor ...
Thus, designing a computer for efficient sequential computation leads to a very different architecture than designing one for efficient parallel computation and there exists no single optimal architecture ...
In-order multithreaded L s -stage inter-thread (super)pipelined F parallel functional units A multithreaded L s -stage interthread superpipelined processor (O7) is a VLIW-style statically scheduled processor ...
doi:10.1016/s1383-7621(02)00064-4
fatcat:graadduelvdupc72gz6bpwsyxm
Hybrid multi-core architecture for boosting single-threaded performance
2007
SIGARCH Computer Architecture News
In this paper, we propose a compiler-driven heterogeneous multicore architecture, consisting of tightly-integrated VLIW (Very Long Instruction Word) and superscalar processors on a single chip, to automatically ...
In the proposed multi-core architecture, while the highperformance VLIW core is used to run code segments with high instruction-level parallelism (ILP) extracted by the compiler; the superscalar core can ...
Figure 1 . 1 A high-level overview of the VLIW/superscalar dual-core architecture. ...
doi:10.1145/1241601.1241603
fatcat:vjzotxsbo5dtvc6oe6wcifxcie
Scheduled dataflow: execution paradigm, architecture, and performance evaluation
2001
IEEE transactions on computers
Recent focus in the field of new processor architectures is mainly on VLIW (e.g., IA-64), superscalar, and superspeculative designs. ...
We have compared the execution cycles required for programs on SDF with the execution cycles required by programs on SimpleScalar (a superscalar simulator) by considering the essential aspects of these ...
in part by the following grants from the US National Science Foundation: CCR-9796310, EIA-9805216, and EIA-9820147 and Italian grant from CNR 203.15.9/97. the authors also than the anonymous reviewers for ...
doi:10.1109/tc.2001.947011
fatcat:e7cco3kjqvcopmukezqkuzyahq
A survey of processors with explicit multithreading
2003
ACM Computing Surveys
Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple ...
Simultaneous multithreaded processors combine the multithreading technique with a wide-issue superscalar processor to utilize a larger part of the issue bandwidth by issuing instructions from different ...
ACKNOWLEDGMENTS The authors would like to thank anonymous reviewers for many valuable comments. ...
doi:10.1145/641865.641867
fatcat:u6x7jdmkfvexnm3culskjsoxwi
Scheduled dataflow: execution paradigm, architecture, and performance evaluation
2001
IEEE transactions on computers
Recent focus in the field of new processor architectures is mainly on VLIW (e.g., IA-64), superscalar, and superspeculative designs. ...
We have compared the execution cycles required for programs on SDF with the execution cycles required by programs on SimpleScalar (a superscalar simulator) by considering the essential aspects of these ...
in part by the following grants from the US National Science Foundation: CCR-9796310, EIA-9805216, and EIA-9820147 and Italian grant from CNR 203.15.9/97. the authors also than the anonymous reviewers for ...
doi:10.1109/12.947003
fatcat:inhwbcvzrnhplobil2togefjg4
Spatial computation
2004
SIGPLAN notices
processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x ...
As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ...
We thank Dan Vogel for help with scripting and benchmark management. Finally, we wish to thank the many reviewers for their helpful comments. ...
doi:10.1145/1037187.1024396
fatcat:5jeulzqygbfnnkch33wohm3imi
Spatial computation
2004
Proceedings of the 11th international conference on Architectural support for programming languages and operating systems - ASPLOS-XI
processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x ...
As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ...
We thank Dan Vogel for help with scripting and benchmark management. Finally, we wish to thank the many reviewers for their helpful comments. ...
doi:10.1145/1024393.1024396
dblp:conf/asplos/BudiuVCG04
fatcat:ncnfj5flsrakpax7vhaf5io3ja
Spatial computation
2004
SIGARCH Computer Architecture News
processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x ...
As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ...
We thank Dan Vogel for help with scripting and benchmark management. Finally, we wish to thank the many reviewers for their helpful comments. ...
doi:10.1145/1037947.1024396
fatcat:5jkfjbhrdzamrdmhosahxd6dzu
Spatial computation
2004
ACM SIGOPS Operating Systems Review
processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x ...
As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ...
We thank Dan Vogel for help with scripting and benchmark management. Finally, we wish to thank the many reviewers for their helpful comments. ...
doi:10.1145/1037949.1024396
fatcat:gycsxj3ebfhazpstc2dbx6ebiq
Multithreaded Processors
2002
Computer journal
The instruction-level parallelism found in a conventional instruction stream is limited. Studies have shown the limits of processor utilization even for today's superscalar microprocessors. ...
Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple ...
However, because horizontal losses will be smaller for two-issue than for high-bandwidth superscalars, a CMP of four two-issue processors will reach a higher utilization than an eight-issue superscalar ...
doi:10.1093/comjnl/45.3.320
fatcat:hlkkabuhrzhkrmuyqomzfmc6zm
Multi-Threaded Processors
[chapter]
2011
Encyclopedia of Parallel Computing
The instruction-level parallelism found in a conventional instruction stream is limited. Studies have shown the limits of processor utilization even for today's superscalar microprocessors. ...
Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple ...
However, because horizontal losses will be smaller for two-issue than for high-bandwidth superscalars, a CMP of four two-issue processors will reach a higher utilization than an eight-issue superscalar ...
doi:10.1007/978-0-387-09766-4_423
fatcat:heb3n2cfwnbi5nvxv5kvxd2xgm
Analysis of the Task Superscalar Architecture Hardware Design
2013
Procedia Computer Science
In this paper, we present a base implementation of the Task Superscalar architecture, as well as a new design with improved performance. ...
The Task Superscalar is an experimental task based dataflow scheduler that dynamically detects inter-task data dependencies, identifies task-level parallelism, and executes tasks in the out-of-order manner ...
We would also like to thank the Xilinx University Program for its hardware and software donations. ...
doi:10.1016/j.procs.2013.05.197
fatcat:dpb7gqgez5f6lh3e3fu565gxgy
Architectural considerations for application-specific counterflow pipelines
1999
Proceedings 20th Anniversary Conference on Advanced Research in VLSI
As an example, the 4-way superscalar HP PA-8000 microprocessor [17] tolerates a cache miss penalty of 50 clock cycles, which may cause the processor to stall for up to 200 instructions. ...
Application-specific processor design is a promising approach for meeting the performance and cost goals of a system. ...
To get high performance, modern superscalar processors include multiple functional units for exploiting instruction-level parallelism. ...
doi:10.1109/arvlsi.1999.756034
dblp:conf/arvlsi/ChildersD99
fatcat:sm2hu2vhrvghjeifj2ca3p7zem
« Previous
Showing results 1 — 15 out of 1,031 results