Filters








258,907 Hits in 4.5 sec

Multiple branch and block prediction

S. Wallace, N. Bagherzadeh
Proceedings Third International Symposium on High-Performance Computer Architecture  
Accurate branch prediction and instruction fetch prediction of a microprocessor are critical to achieve high performance.  ...  As a result, a two block, multiple branch prediction mechanism for a block width of 8 instructions achieves an effective fetching rate of 8 instructions per cycle on the SPEC95 benchmark suite.  ...  In this paper, however, we present a scalable mechanism to perform multiple branch and multiple block prediction using the PHT and NLS concepts. Seznec et. al.  ... 
doi:10.1109/hpca.1997.569645 dblp:conf/hpca/WallaceB97 fatcat:t2znawnzrzfgtakd36luw5cnq4

Completion time multiple branch prediction for enhancing trace cache performance

Ryan Rakvic, Bryan Black, John Paul Shen
2000 SIGARCH Computer Architecture News  
Results: A realistic-size TMP (72KB) can predict 1, 2, 3, and 4 consecutive blocks with compounded prediction accuracies of 96%, 93%, 87%, and 82%, respectively.  ...  It employs a tree structure of branch predictors, or tree-node predictors, and achieves accurate multiple branch prediction by leveraging the high accuracies of the individual branch predictors.  ...  This work was supported in part by ONR (N00014-96-1-0347, N00014-96-1-0928) and in part by Intel Corp.  ... 
doi:10.1145/342001.339654 fatcat:j3z3aiav3vfdjc3uttboleusjq

Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

Tse-Yu Yeh, Deborah T. Marr, Yale N. Patt
1993 Proceedings of the 7th international conference on Supercomputing - ICS '93  
We present a mechanism for predicting multiple branches and fetching multiple non-consecutive basic blocks each cycle which is both viable and e ective.  ...  Viable mechanisms for fetching multiple non-consecutive basic blocks have not been previously investigated.  ...  We are particularly grateful to Intel and Motorola for technical and nancial support, and to NCR for the gift of an NCR 3550, which is a useful compute server in much o f o u r work.  ... 
doi:10.1145/165939.165956 dblp:conf/ics/YehMP93 fatcat:hjpgoqdmhrcoldpqgmqzzl3mvu

Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

Tse-Yu Yeh, Deborah T. Marr, Yale N. Patt
2014 25th Anniversary International Conference on Supercomputing Anniversary Volume -  
We present a mechanism for predicting multiple branches and fetching multiple non-consecutive basic blocks each cycle which is both viable and e ective.  ...  Viable mechanisms for fetching multiple non-consecutive basic blocks have not been previously investigated.  ...  We are particularly grateful to Intel and Motorola for technical and nancial support, and to NCR for the gift of an NCR 3550, which is a useful compute server in much o f o u r work.  ... 
doi:10.1145/2591635.2667167 fatcat:n23o6cduwff47h3f3tgrc7oesm

Enlarging Instruction Streams

Oliverio J. Santana, Alex Ramirez, Mateo Valero
2007 IEEE transactions on computers  
We call a sequence of instructions from the target of a taken branch to the next taken branch, potentially containing multiple basic blocks, a stream.  ...  Consequently, we propose the multiple-stream predictor, a novel mechanism that deals with all branch types by combining single streams into long virtual streams.  ...  Basic block A is the target of a taken branch and the next taken branch is found at the end of basic block D.  ... 
doi:10.1109/tc.2007.70742 fatcat:2ggb4k6yrbfdnjf3u3qzjlj3tq

A trace cache microarchitecture and evaluation

E. Rotenberg, S. Bennett, J.E. Smith
1999 IEEE transactions on computers  
It will eventually become necessary to fetch multiple basic blocks per clock cycle.  ...  However, for one benchmark whose performance is limited by branch mispredictions, the performance gain is due almost entirely to improved prediction accuracy.  ...  We would also like to give special thanks to Quinn Jacobson for his valuable input and for providing access to next trace prediction simulators.  ... 
doi:10.1109/12.752652 fatcat:5nrm3ihc5rcpzlqop3dkpmkcjq

The block-based trace cache

Bryan Black, Bohuslav Rychlik, John Paul Shen
1999 SIGARCH Computer Architecture News  
Performance potential of the blockbased trace cache is quantified and compared with perfect branch prediction and perfect fetch schemes.  ...  Results: Using the SPECint95 benchmarks, a 16-wide realistic design of a block-based trace cache can improve performance 75% over a baseline design and to within 7% of a baseline design with perfect branch  ...  This suggests that perfectly predicting just the first taken branch is not enough. Being able to make multiple-branch predictions and fetch from multiple predicted targets is very beneficial.  ... 
doi:10.1145/307338.300996 fatcat:dygrp6f3nfdklipb7gcmavkote

Instruction fetch architectures and code layout optimizations

A. Ramirez, J.L. Larriba-Pey, M. Valero
2001 Proceedings of the IEEE  
Overall, we show how instruction fetch has evolved from fetching one instruction every few cycles, to fetching one instruction per cycle, to fetching a full basic block per cycle, to several basic blocks  ...  This means that a faster execution engine also requires a faster fetch engine, to ensure that it is possible to read and decode enough instructions to keep the pipeline full and the functional units busy  ...  Fig. 14 . 14 Extension of the superscalar fetch engine with a multiple branch predictor to read multiple consecutive basic blocks per cycle.  ... 
doi:10.1109/5.964440 fatcat:yp3a5e42wbfjtfkqsyfr5dkrcq

Enhancing instruction scheduling with a block-structured ISA

Stephen Melvin, Yale Patt
1995 International journal of parallel programming  
We show that a block-structured ISA utilizes both dynamic and compile-time mechanisms for exploiting instruction level parallelism and has significant performance advantages over a conventional ISA. ).  ...  In this paper we discuss some previous techniques along with their hardware and software requirements. Then we propose a new paradigm for an instruction set architecture (ISA): block-structuring.  ...  They often have small basic blocks and conditional branches that are hard to predict statically.  ... 
doi:10.1007/bf02577867 fatcat:yo27fovzwvfsvp6uddauzidjyu

Analysis of the TRIPS prototype block predictor

Nitya Ranganathan, Doug Burger, Stephen W. Keckler
2009 2009 IEEE International Symposium on Performance Analysis of Systems and Software  
between block prediction accuracies and branch prediction accuracies.  ...  Simulation-driven analysis identifies short history lengths, inadequate offset bits in the branch target buffers, and aliasing in the exit and target predictors as the main reasons for the predictor inefficiency  ...  Acknowledgments This research is supported by a Defense Advanced Research Projects Agency contract F33615-01-C-4106 and by NSF CISE Research Infrastructure grant EIA-0303609.  ... 
doi:10.1109/ispass.2009.4919651 dblp:conf/ispass/RanganathanBK09 fatcat:spvuve44gbdprmprp2j4j2kaue

Uncertainty-Aware Lung Nodule Segmentation with Multiple Annotations [article]

Qiuli Wang, Han Yang, Lu Shen, Mengke Zhang
2021 arXiv   pre-print
Experimental results show that our method can predict the reasonable regions with higher uncertainty and improve lung nodule segmentation performance in LIDC-IDRI.  ...  We introduce a Feature-Aware Concatenation structure for different learning targets and let each branch have a specific learning preference.  ...  Dual-Branch U-Net (which add the Feature-Aware Concatenation Block to the Dual-Branch U-Net) and our method.  ... 
arXiv:2110.12372v1 fatcat:udv55j7oxbbmflslfcgu72tfxa

Characterizing the impact of predicated execution on branch prediction

Scott A. Mahlke, Richard E. Hank, Roger A. Bringmann, John C. Gyllenhaal, David M. Gallagher, Wen-mei W. Hwu
1994 Proceedings of the 27th annual international symposium on Microarchitecture - MICRO 27  
Even with sophisticated branch prediction techniques, many frequently executed branches remain di cult to predict.  ...  An architecture supporting predicated execution may allow the compiler to remove many of these hard-to-predict branches, reducing the number of branch mispredictions and thereby improving performance.  ...  Acknowledgements The authors would like to thank all members of the IMPACT research group and the anonymous referees whose comments and suggestions helped to improve the quality of this paper signi cantly  ... 
doi:10.1145/192724.192755 fatcat:xrjhapgubfeo7cxusbezblrxti

Multiple Hypothesis Colorization [article]

Mohammad Haris Baig, Lorenzo Torresani
2017 arXiv   pre-print
Second, many objects in the real world exhibit multiple possible colors.  ...  Since color information occupies a large proportion of the total storage size of an image, a method that can predict accurate color from its grayscale version can produce dramatic reduction in image file  ...  For the Branch Prediction module in the ImageNet experiments we used the features from the layers "UpSample-4", "Conv-1" and "Block-3".  ... 
arXiv:1606.06314v3 fatcat:zb4g5tf4rfdg5cfhe2vaurlzee

Role of Multiblocks in Control Flow Prediction using Parallel Register Sharing Architecture

Rajendra Kumar, P K Singh
2010 International Journal of Computer Applications  
The main idea behind this concept is to use a step beyond the prediction of common branch and permitting the architecture to have the information about the CFG (Control Flow Graph) components of the program  ...  By this the size of initiation is increased that allows the overlapped execution of multiple independent flow of control. The multiple branch instruction can also be allowed.  ...  It used traversal of multiple branches in a single prediction. The effect on the accuracy of the branch prediction was not seen uniform across all programs.  ... 
doi:10.5120/815-1156 fatcat:i4auzcygdzb47jnv72urhz5yzi

Optimising Dynamic Binary Modification Across ARM Microarchitectures

Cosmin Gorgovan, Amanieu d'Antras, Mikel Luján
2018 Proceedings of the 2018 ACM/SPEC International Conference on Performance Engineering - ICPE '18  
issue, in order cores up to 6-issue out-of-order cores and including less traditional implementations.  ...  The ARM hardware ecosystem poses unique challenges for high performance DBM systems because of the large number and wide  ...  If, for example, block C in the CFG shown in Figure 2 is on the hot code path and its indirect branch has a 70% bias toward block E, 30% toward block F and never branches to block G, then both the E  ... 
doi:10.1145/3184407.3184425 dblp:conf/wosp/GorgovanDL18 fatcat:5uc5nvvppfdw5masjqe5yws35y
« Previous Showing results 1 — 15 out of 258,907 results