Filters








1,031 Hits in 4.3 sec

Baring it all to software: Raw machines

E. Waingold, M. Taylor, D. Srikrishna, V. Sarkar, W. Lee, V. Lee, J. Kim, M. Frank, P. Finch, R. Barua, J. Babb, S. Amarasinghe (+1 others)
1997 Computer  
A Baring It All to Software: Raw Machines This innovative approach eliminates the traditional instruction set interface and instead exposes the details of a simple replicated architecture directly to the  ...  Acknowledgments We thank Doug Burger for his incisive technical feedback and his help in condensing this article.  ...  This project is funded by US Defense Advanced Research Projects Agency contract DABT63-96-C-0036 and a National Science Foundation Presidential Young Investigator Award.  ... 
doi:10.1109/2.612254 fatcat:56kdywilqzfiralt33jgiq4aeq

Compilers for instruction-level parallelism

M. Schlansker, T.M. Conte, J. Dehnert, K. Ebcioglu, J.Z. Fang, C.L. Thompson
1997 Computer  
ILP in a software-centric approach employs a very long instruction word (VLIW) processor and relies on a compiler to statically parallelize and schedule code.  ...  In a hardware-centric implementation, ILP on a superscalar processor executes a sequential instruction stream.  ...  Superscalar processors The most visible ILP processors are general-purpose processors, for which superscalar technology is the current design of choice.  ... 
doi:10.1109/2.642817 fatcat:sqa3irdg3zcqzftmok3rpsv65a

Architectural differences of efficient sequential and parallel computers

Martti J. Forsell
2002 Journal of systems architecture  
For that purpose we analytically evaluate the performance of eight general purpose processor architectures representing widely both commercial and scientific processor designs in both single processor  ...  Thus, designing a computer for efficient sequential computation leads to a very different architecture than designing one for efficient parallel computation and there exists no single optimal architecture  ...  In-order multithreaded L s -stage inter-thread (super)pipelined F parallel functional units A multithreaded L s -stage interthread superpipelined processor (O7) is a VLIW-style statically scheduled processor  ... 
doi:10.1016/s1383-7621(02)00064-4 fatcat:graadduelvdupc72gz6bpwsyxm

Hybrid multi-core architecture for boosting single-threaded performance

Jun Yan, Wei Zhang
2007 SIGARCH Computer Architecture News  
In this paper, we propose a compiler-driven heterogeneous multicore architecture, consisting of tightly-integrated VLIW (Very Long Instruction Word) and superscalar processors on a single chip, to automatically  ...  In the proposed multi-core architecture, while the highperformance VLIW core is used to run code segments with high instruction-level parallelism (ILP) extracted by the compiler; the superscalar core can  ...  Figure 1 . 1 A high-level overview of the VLIW/superscalar dual-core architecture.  ... 
doi:10.1145/1241601.1241603 fatcat:vjzotxsbo5dtvc6oe6wcifxcie

Scheduled dataflow: execution paradigm, architecture, and performance evaluation

K.M. Kavi, R. Giorgi, J. Arul
2001 IEEE transactions on computers  
Recent focus in the field of new processor architectures is mainly on VLIW (e.g., IA-64), superscalar, and superspeculative designs.  ...  We have compared the execution cycles required for programs on SDF with the execution cycles required by programs on SimpleScalar (a superscalar simulator) by considering the essential aspects of these  ...  in part by the following grants from the US National Science Foundation: CCR-9796310, EIA-9805216, and EIA-9820147 and Italian grant from CNR 203.15.9/97. the authors also than the anonymous reviewers for  ... 
doi:10.1109/tc.2001.947011 fatcat:e7cco3kjqvcopmukezqkuzyahq

A survey of processors with explicit multithreading

Theo Ungerer, Borut Robič, Jurij Šilc
2003 ACM Computing Surveys  
Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  Simultaneous multithreaded processors combine the multithreading technique with a wide-issue superscalar processor to utilize a larger part of the issue bandwidth by issuing instructions from different  ...  ACKNOWLEDGMENTS The authors would like to thank anonymous reviewers for many valuable comments.  ... 
doi:10.1145/641865.641867 fatcat:u6x7jdmkfvexnm3culskjsoxwi

Scheduled dataflow: execution paradigm, architecture, and performance evaluation

K.M. Kavi, R. Giorgi, J. Arul
2001 IEEE transactions on computers  
Recent focus in the field of new processor architectures is mainly on VLIW (e.g., IA-64), superscalar, and superspeculative designs.  ...  We have compared the execution cycles required for programs on SDF with the execution cycles required by programs on SimpleScalar (a superscalar simulator) by considering the essential aspects of these  ...  in part by the following grants from the US National Science Foundation: CCR-9796310, EIA-9805216, and EIA-9820147 and Italian grant from CNR 203.15.9/97. the authors also than the anonymous reviewers for  ... 
doi:10.1109/12.947003 fatcat:inhwbcvzrnhplobil2togefjg4

Spatial computation

Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, Seth Copen Goldstein
2004 SIGPLAN notices  
processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x  ...  As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory.  ...  We thank Dan Vogel for help with scripting and benchmark management. Finally, we wish to thank the many reviewers for their helpful comments.  ... 
doi:10.1145/1037187.1024396 fatcat:5jeulzqygbfnnkch33wohm3imi

Spatial computation

Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, Seth Copen Goldstein
2004 Proceedings of the 11th international conference on Architectural support for programming languages and operating systems - ASPLOS-XI  
processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x  ...  As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory.  ...  We thank Dan Vogel for help with scripting and benchmark management. Finally, we wish to thank the many reviewers for their helpful comments.  ... 
doi:10.1145/1024393.1024396 dblp:conf/asplos/BudiuVCG04 fatcat:ncnfj5flsrakpax7vhaf5io3ja

Spatial computation

Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, Seth Copen Goldstein
2004 SIGARCH Computer Architecture News  
processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x  ...  As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory.  ...  We thank Dan Vogel for help with scripting and benchmark management. Finally, we wish to thank the many reviewers for their helpful comments.  ... 
doi:10.1145/1037947.1024396 fatcat:5jkfjbhrdzamrdmhosahxd6dzu

Spatial computation

Mihai Budiu, Girish Venkataramani, Tiberiu Chelcea, Seth Copen Goldstein
2004 ACM SIGOPS Operating Systems Review  
processors; and (3) that ASIC implementations of ASH use three orders of magnitude less energy compared to high-end superscalar processors, while being on average only 33% slower in performance (3.5x  ...  As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory.  ...  We thank Dan Vogel for help with scripting and benchmark management. Finally, we wish to thank the many reviewers for their helpful comments.  ... 
doi:10.1145/1037949.1024396 fatcat:gycsxj3ebfhazpstc2dbx6ebiq

Multithreaded Processors

T. Ungerer
2002 Computer journal  
The instruction-level parallelism found in a conventional instruction stream is limited. Studies have shown the limits of processor utilization even for today's superscalar microprocessors.  ...  Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  However, because horizontal losses will be smaller for two-issue than for high-bandwidth superscalars, a CMP of four two-issue processors will reach a higher utilization than an eight-issue superscalar  ... 
doi:10.1093/comjnl/45.3.320 fatcat:hlkkabuhrzhkrmuyqomzfmc6zm

Multi-Threaded Processors [chapter]

David Padua, Amol Ghoting, John A. Gunnels, Mark S. Squillante, José Meseguer, James H. Cownie, Duncan Roweth, Sarita V. Adve, Hans J. Boehm, Sally A. McKee, Robert W. Wisniewski, George Karypis (+29 others)
2011 Encyclopedia of Parallel Computing  
The instruction-level parallelism found in a conventional instruction stream is limited. Studies have shown the limits of processor utilization even for today's superscalar microprocessors.  ...  Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  However, because horizontal losses will be smaller for two-issue than for high-bandwidth superscalars, a CMP of four two-issue processors will reach a higher utilization than an eight-issue superscalar  ... 
doi:10.1007/978-0-387-09766-4_423 fatcat:heb3n2cfwnbi5nvxv5kvxd2xgm

Analysis of the Task Superscalar Architecture Hardware Design

Fahimeh Yazdanpanah, Daniel Jimenez-Gonzalez, Carlos Alvarez-Martinez, Yoav Etsion, Rosa M. Badia
2013 Procedia Computer Science  
In this paper, we present a base implementation of the Task Superscalar architecture, as well as a new design with improved performance.  ...  The Task Superscalar is an experimental task based dataflow scheduler that dynamically detects inter-task data dependencies, identifies task-level parallelism, and executes tasks in the out-of-order manner  ...  We would also like to thank the Xilinx University Program for its hardware and software donations.  ... 
doi:10.1016/j.procs.2013.05.197 fatcat:dpb7gqgez5f6lh3e3fu565gxgy

Architectural considerations for application-specific counterflow pipelines

B.R. Childers, J.W. Davidson
1999 Proceedings 20th Anniversary Conference on Advanced Research in VLSI  
As an example, the 4-way superscalar HP PA-8000 microprocessor [17] tolerates a cache miss penalty of 50 clock cycles, which may cause the processor to stall for up to 200 instructions.  ...  Application-specific processor design is a promising approach for meeting the performance and cost goals of a system.  ...  To get high performance, modern superscalar processors include multiple functional units for exploiting instruction-level parallelism.  ... 
doi:10.1109/arvlsi.1999.756034 dblp:conf/arvlsi/ChildersD99 fatcat:sm2hu2vhrvghjeifj2ca3p7zem
« Previous Showing results 1 — 15 out of 1,031 results