Filters








4,884 Hits in 5.0 sec

Improving Branch Prediction and Predicated Execution in Out-of-Order Processors

Eduardo Quinones, Joan-Manuel Parcerisa, Antonio Gonzailez
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
Although it is globally beneficial, it has a negative side-effect because the removal of branches elimienables a very efficient implementation of if-conversion for an out-of-order processor; with almost  ...  no additional hard-ware cost, because the same hardware is used to predict the predicates of if-converted code and to predict branches without accuracy degradation.  ...  Acknowledgements This work is supported by the Spanish Ministry of Education and Science and FEDER funds of the EU under contracts TIN 2004-03072, and TIN 2004-07739-C02-01, and Intel Corporation.  ... 
doi:10.1109/hpca.2007.346186 dblp:conf/hpca/QuinonesPG07 fatcat:eqz2vp4m6ndg5ilsslmko27bkq

Wish Branches: Enabling Adaptive and Aggressive Predicated Execution

Hyesoon Kim, O. Mutlu, Y.N. Patt, J. Stark
2006 IEEE Micro  
A predicated branch remains predicated for all its dynamic instances even if it turns out to be very easy to predict at runtime.  ...  Predicated execution has been used to avoid performance loss because of hard-topredict branches.  ...  Lee, HP TestDrive, Roy Ju, Derek Chiou, and the members of the HPS research group.  ... 
doi:10.1109/mm.2006.27 fatcat:xt4tpnx6b5djdj2qgoyqeypc5m

Selective predicate prediction for out-of-order processors

Eduardo Quiñones, Joan-Manuel Parcerisa, Antonio Gonzalez
2006 Proceedings of the 20th annual international conference on Supercomputing - ICS '06  
However, the use of predicated execution in out-of-order processors has to deal with two problems: there can be multiple definitions for a single destination register at rename time, and instructions with  ...  It is useful to eliminate hard-to-predict branches and to reduce the severe performance impact of branch mispredictions.  ...  Some studies have shown that predicated execution provides an opportunity to significantly improve hard-to-predict branch handling in out-of-order processors [4] [12] .  ... 
doi:10.1145/1183401.1183410 dblp:conf/ics/QuinonesPG06 fatcat:x7b3z6pnjnefnhrcmkbdw5g64a

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Hyesoon Kim, Jose Joao, Onur Mutlu, Yale Patt
2006 Microarchitecture (MICRO), Proceedings of the Annual International Symposium on  
The goal of this paradigm is to eliminate branch mispredictions due to hard-to-predict dynamic branches by dynamically predicating them without requiring ISA support for predicate registers and predicated  ...  We also compare DMP with previously proposed predication and dual-path/multipath execution paradigms in terms of performance, complexity, and energy consumption, and find that DMP is the highest performance  ...  We gratefully acknowledge the support of the Cockrell Foundation, Intel Corporation and the Advanced Technology Program of the Texas Higher Education Coordinating Board.  ... 
doi:10.1109/micro.2006.20 dblp:conf/micro/KimJMP06 fatcat:5lhl3osqafbj7ic3zcoi6zytnu

A reprogrammable customization framework for efficient branch resolution in embedded processors

Peter Petrov, Alex Orailoglu
2005 ACM Transactions on Embedded Computing Systems  
Experimental results show that for a representative set of control-dominated applications a reduction in the range of 3-22% in processor cycles can be achieved, thus extending the scope of low-cost embedded  ...  The increased processor utilization leads to a low-cost system implementation with no sacrifice in performance requirements and to reduced custom hardware in a typical SOC.  ...  -Resolving the frequently executed, hard-to-predict branches and folding them out with their target instructions results in proportionately greater performance and power improvements.  ... 
doi:10.1145/1067915.1067924 fatcat:q5u7lok6fzcppkggatkek6epoa

Predicate prediction for efficient out-of-order execution

Weihaw Chuang, Brad Calder
2003 Proceedings of the 17th annual international conference on Supercomputing - ICS '03  
Predicated execution is an important optimization even for an out-of-order processor, since it can eliminate hard to predict branches and help to enable software pipelining.  ...  Using predication with out-of-order execution creates a naming bottleneck, because there can be multiple definitions reaching a use, and not knowing which use is the correct one can stall the processor  ...  CCR-0073551 and a grant from Intel Corporation. We especially would like to thank Intel for providing the Electron compiler sources, and their assistance in using it.  ... 
doi:10.1145/782837.782840 fatcat:dj43frlsubgotcczgcdo2434ie

Predicate prediction for efficient out-of-order execution

Weihaw Chuang, Brad Calder
2003 Proceedings of the 17th annual international conference on Supercomputing - ICS '03  
Predicated execution is an important optimization even for an out-of-order processor, since it can eliminate hard to predict branches and help to enable software pipelining.  ...  Using predication with out-of-order execution creates a naming bottleneck, because there can be multiple definitions reaching a use, and not knowing which use is the correct one can stall the processor  ...  CCR-0073551 and a grant from Intel Corporation. We especially would like to thank Intel for providing the Electron compiler sources, and their assistance in using it.  ... 
doi:10.1145/782814.782840 dblp:conf/ics/ChuangC03 fatcat:demrdw43czdurcqquziygfyji4

Speeding up control-dominated applications through microarchitectural customizations in embedded processors

Peter Petrov, Alex Orailoglu
2001 Proceedings of the 38th conference on Design automation - DAC '01  
A low-cost late customizable hardware that uses application information to fold out a set of frequently executed branches is described.  ...  processors in complex co-designs for control intensive systems. ½ This assembly code is part of the ADPCM Encode benchmark [8] and was produced by gcc for the SimpleScalar toolset [9] .  ...  Resolving the frequently executed, hard-to-predict branches and folding them out with their target instructions results in proportionately greater performance and power improvements.  ... 
doi:10.1145/378239.379014 dblp:conf/dac/PetrovO01 fatcat:uiukhdptgjgc3ojpl5t5d5jw2q

An EPIC Processor with Pending Functional Units [chapter]

Lori Carter, Weihaw Chuang, Brad Calder
2002 Lecture Notes in Computer Science  
The goal of this paper is to examine, in small steps, changing the in-order Itanium processor model to allow execution to be performed out-of-order.  ...  The Itanium processor, an implementation of an Explicitly Parallel Instruction Computing (EPIC) architecture, is an in-order processor that fetches, executes, and forwards results to functional units inorder  ...  This work was funded in part by NSF grant No. 0073551, a grant from Intel Corporation, and an equipment grant from Hewlett Packard and Intel Corporation.  ... 
doi:10.1007/3-540-47847-7_27 fatcat:wegmhtvlbvccjn4tn5ivv42hee

Achieving Superscalar Performance without Superscalar Overheads - A Dataflow Compiler IR for Custom Computing

Ali Mustafa Zaidi, David J. Greaves, Marc Herbstritt
2013 Imperial College Computing Student Workshop  
This paper addresses the problem of improving sequential performance in custom hardware by (a) switching from a statically scheduled to a dynamically scheduled (dataflow) execution model, and (b) developing  ...  Our custom hardware is able to approach the sequential cycle-counts of an Intel Nehalem Core i7 superscalar processor, while consuming on average only 0.25× the energy of an in-order Altera Nios IIf processor  ...  of out-of-order processors, allowing for better tolerance of variable latencies and statically unpredictable behaviour.  ... 
doi:10.4230/oasics.iccsw.2013.136 dblp:conf/iccsw/ZaidiG13 fatcat:5um2rvefbzf6do4hkrr6fgkbka

A comparison of full and partial predicated execution support for ILP processors

Scott A. Mahlke, Richard E. Hank, James E. McCormick, David I. August, Wen-Mei W. Hwu
1995 Proceedings of the 22nd annual international symposium on Computer architecture - ISCA '95  
One can e ectively utilize predicated execution to improve branch handling in instruction-level parallel processors.  ...  Although the potential bene ts of predicated execution are high, the tradeo s involved in the design of an instruction set to support predicated execution can be di cult.  ...  We also wish to extend thanks to Mike Schlansker and Vinod Kathail at HP Labs for their insightful discussions of the Playdoh model of predicated execution.  ... 
doi:10.1145/223982.225965 dblp:conf/isca/MahlkeHMAH95 fatcat:aalzzmqdhral5i2f37bxd2t2m4

A comparison of full and partial predicated execution support for ILP processors

Scott A. Mahlke, Richard E. Hank, James E. McCormick, David I. August, Wen-Mei W. Hwu
1995 SIGARCH Computer Architecture News  
One can e ectively utilize predicated execution to improve branch handling in instruction-level parallel processors.  ...  Although the potential bene ts of predicated execution are high, the tradeo s involved in the design of an instruction set to support predicated execution can be di cult.  ...  We also wish to extend thanks to Mike Schlansker and Vinod Kathail at HP Labs for their insightful discussions of the Playdoh model of predicated execution.  ... 
doi:10.1145/225830.225965 fatcat:sajrwuwyxjh27na4hoj4r3jody

Design Principles for Synthesizable Processor Cores [chapter]

Pascal Schleuniger, Sally A. McKee, Sven Karlsson
2012 Lecture Notes in Computer Science  
In this paper, we propose general design principles to increase instruction throughput on FPGA-based processor cores: first, superpipelining enables higher-frequency system clocks, and second, predicated  ...  We demonstrate through the use of micro-benchmarks that our principles guide the design of a processor core that improves performance by an average of 38% over a similar Xilinx MicroBlaze configuration  ...  The authors acknowledge the HiPEAC 2 European Network of Excellence.  ... 
doi:10.1007/978-3-642-28293-5_10 fatcat:qhrpdxwnubdvnk4zayrg7yr2v4

Non-uniform program analysis & repeatable execution constraints

Aravindh Anantaraman, Eric Rotenberg
2006 ACM SIGBED Review  
We exploit the fact that out-of-order processors can be analyzed via simulation in the absence of variable control-flow.  ...  Our second technique, Repeatable Execution Constraints for out-of-ORDER (RECORDER), defines constraints that guarantee a single inputindependent execution time on an out-of-order pipeline for program segments  ...  Figure 5 shows the actual execution times (AETs) of the benchmark on an in-order scalar pipeline with static branch prediction (called simple) and an out-of-order 2issue pipeline with dynamic branch prediction  ... 
doi:10.1145/1279711.1279716 fatcat:op4e4khrqrg3ffwa6mrwu3kbie

Improving the performance of object-oriented languages with dynamic predication of indirect jumps

Jose A. Joao, Onur Mutlu, Hyesoon Kim, Rishi Agarwal, Yale N. Patt
2008 SIGPLAN notices  
The hardware predicates the instructions between different targets of the jump and its CFM point if the jump turns out to be hard-to-predict at run time.  ...  This paper proposes a new way of handling hard-to-predict indirect jumps: dynamically predicating them.  ...  Part of this work was done while José Joao and Rishi Agarwal were interns at Microsoft Research.  ... 
doi:10.1145/1353536.1346293 fatcat:xk5kumpxgjg5hebooawabvheja
« Previous Showing results 1 — 15 out of 4,884 results