Filters








24,399 Hits in 4.8 sec

Instruction flow-based front-end throttling for power-aware high-performance processors

Amirali Baniasadi, Andreas Moshovos
2001 Proceedings of the 2001 international symposium on Low power electronics and design - ISLPED '01  
We introduce a new class of methods that exploit information about instruction flow (rate of instructions passing through stages).  ...  Our methods reduce power dissipation by selectively turning on and off instruction fetch and decode.  ...  Shown in part (a) is sequence of four instructions, a through d. Instructions a and b are independent, while d depends on c.  ... 
doi:10.1145/383082.383088 dblp:conf/islped/BaniasadiM01 fatcat:nbagrpjzqbdcjmkf4wlhuvuujq

Exploiting Postdominance for Speculative Parallelization

Mayank Agarwal, Kshitiz Malik, Kevin M. Woley, Sam S. Stone, Matthew I. Frank
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
Task-selection policies are critical to the performance of any architecture that uses speculation to extract parallel tasks from a sequential thread.  ...  Computational resources were supported by an equipment donation from AMD Corp. and the National Science Foundation under grant EIA-0224453, and through the contribution of the use of a computing cluster  ...  Rather it fetches and executes instruction control independent of those branches.  ... 
doi:10.1109/hpca.2007.346207 dblp:conf/hpca/AgarwalMWSF07 fatcat:hthp2iczz5dhrpcyiw2syrzxza

Exploiting speculative thread-level parallelism on a SMT processor [chapter]

Pedro Marcuello, Antonio González
1999 Lecture Notes in Computer Science  
The threads are speculative in the sense that they are created by predicting the future control flow of the program. Moreover, threads are not necessarily independent.  ...  To avoid the serialization that such dependences may cause, inter-thread dependences as well as the values that flow through them are predicted.  ...  This confirms the potential benefits of the fetch mechanism in terms of reduction in fetch bandwidth requirements.  ... 
doi:10.1007/bfb0100636 fatcat:2fgstvv4kza5rcq4rzqn3kcxfm

Thermal-aware memory management unit of 3D-stacked DRAM for 3D high definition (HD) video

Chih-Yuan Chang, Po-Tsang Huang, Yi-Chun Chen, Tian-Sheuan Chang, Wei Hwang
2014 2014 27th IEEE International System-on-Chip Conference (SOCC)  
Moreover, power reduction of up to 43.46% can be realized in low power mode by the dynamic thermal-aware refresh timing control and deep power down detection. I.  ...  The hierarchal MMU can improve bandwidth by 54.3% through command reordering and bank/rank interleaving.  ...  Fig. 15 presents the energy estimations with/without the proposed pre-fetch FIFOs, respectively. The pre-fetch technique can realize 43.46% energy reduction on average. VII.  ... 
doi:10.1109/socc.2014.6948903 dblp:conf/socc/ChangHCCH14 fatcat:usgl5fe2obcdrdyv447pdssl5e

Performance-aware speculation control using wrong path usefulness prediction

Chang Joo Lee, Hyesoon Kim, Onur Mutlu, Yale N. Patt
2008 High-Performance Computer Architecture  
Fetch gating mechanisms have been proposed to gate the processor pipeline to reduce the wasted energy consumption due to wrongpath (i.e. mis-speculated) instructions.  ...  This paper proposes a comprehensive, low-cost speculation control mechanism that takes into account the usefulness of wrong-path execution, while effectively reducing the energy consumption due to useless  ...  independence: control-flow independent program portions that are executed twice, once before a misprediction and once after. 2 Figure 3 shows a program segment from mcf and its control flow graph that  ... 
doi:10.1109/hpca.2008.4658626 dblp:conf/hpca/LeeKMP08 fatcat:eghcuesv5rf4fc5ftwxxihwjbq

The Vector-Thread Architecture

Ronny Krashinsky, Christopher Batten, Mark Hampton, Steve Gerding, Brian Pharris, Jared Casper, Krste Asanovic
2004 SIGARCH Computer Architecture News  
The control processor can use vector-fetch commands to broadcast instructions to all the VPs or each VP can use thread-fetches to direct its own control flow.  ...  Memory cross−VP start/stop queue Regs thread−fetch VP [vl−1] Regs thread−fetch VP0 Regs thread−fetch VP1 ALUs ALUs ALUs vector−fetch vector−fetch vector−fetch command Control Processor Figure 1: Abstract  ...  These thread-fetches break the rigid control flow of traditional vector machines, enabling the VP threads to follow independent control paths.  ... 
doi:10.1145/1028176.1006736 fatcat:5kezezpkgvhy7dehzvzjqzfoaq

The vector-thread architecture

B. Krashinsky, C. Batten, M. Hampton, S. Gerding, B. Pharris, J. Casper, K. Asanovic
2004 IEEE Micro  
The control processor can use vector-fetch commands to broadcast instructions to all the VPs or each VP can use thread-fetches to direct its own control flow.  ...  Memory cross−VP start/stop queue Regs thread−fetch VP [vl−1] Regs thread−fetch VP0 Regs thread−fetch VP1 ALUs ALUs ALUs vector−fetch vector−fetch vector−fetch command Control Processor Figure 1: Abstract  ...  These thread-fetches break the rigid control flow of traditional vector machines, enabling the VP threads to follow independent control paths.  ... 
doi:10.1109/mm.2004.90 fatcat:zp3cqejacjht5kxia7tmnhkpsy

Diverge-Merge Processor (DMP): Dynamic Predicated Execution of Complex Control-Flow Graphs Based on Frequently Executed Paths

Hyesoon Kim, Jose Joao, Onur Mutlu, Yale Patt
2006 Microarchitecture (MICRO), Proceedings of the Annual International Symposium on  
Our evaluations show that DMP outperforms a baseline processor with an aggressive branch predictor by 19.3% on average over SPEC integer 95 and 2000 benchmarks, through a reduction of 38% in pipeline flushes  ...  The key insight behind DMP is that most control-flow graphs look and behave like simple hammock (if-else) structures when only frequently executed paths in the graphs are considered.  ...  Related Work on Control Flow Independence Several hardware mechanisms were proposed to exploit control flow independence [35, 36, 11, 8, 16] .  ... 
doi:10.1109/micro.2006.20 dblp:conf/micro/KimJMP06 fatcat:5lhl3osqafbj7ic3zcoi6zytnu

Power-sensitive multithreaded architecture

John S. Seng, Dean M. Tullsen, Oeorge Z. N. Cai
2012 2012 IEEE 30th International Conference on Computer Design (ICCD)  
The greatest reductions come in the front of the pipeline (e.g. fetch), which is always the slowest to recover from a branch misprediction.  ...  Pipeline gating [11] uses branch confidence prediction [9, 7] to control speculation in a convention singlethreaded pipeline, stopping fetch beyond low-confidence branches.  ... 
doi:10.1109/iccd.2012.6378610 dblp:conf/iccd/SengTC12a fatcat:u5vcymjbbrgy5gtqxarx7glk4e

A distributed control path architecture for VLIW processors

Hongtao Zhong, K. Fan, S. Mahlke, M. Schlansker
2005 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)  
In this paper, we propose a distributed control path architecture for VLIW processors (DVLIW) to overcome the scalability problem of VLIW control paths.  ...  The architecture simplifies the dispersal of complex VLIW instructions and supports efficient distribution of instructions through a limited bandwidth interconnect, while supporting compressed instruction  ...  The third bar shows the total reduction of global traffic including both intercluster move and instruction fetch.  ... 
doi:10.1109/pact.2005.5 dblp:conf/IEEEpact/ZhongFMS05 fatcat:j6lo66yhp5dufhcpnot4s4fziu

Harnessing horizontal parallelism and vertical instruction packing of programs to improve system overall efficiency

Hai Lin, Yunsi Fei
2008 Proceedings of the conference on Design, automation and test in Europe - DATE '08  
., 71.1% reduction in the fetch energy consumption for a 4-way VLIW architecture with 8-entry IRFs).  ...  of independent operations.  ...  An OR logic is used to take in the two pipe's status signals and output a fetch control signal for the instruction cache in IF stage.  ... 
doi:10.1145/1403375.1403559 fatcat:55gxvy6ca5ewfcwmngkwuqjc6u

Exploiting criticality to reduce bottlenecks in distributed uniprocessors

Behnam Robatmili, Sibi Govindan, Doug Burger, Stephen W. Keckler
2011 2011 IEEE 17th International Symposium on High Performance Computer Architecture  
The framework exploits the fetch criticality information at a coarser granularity by reissuing all instructions in the blocks previously fetched into the merged cores.  ...  Composable multicore systems merge multiple independent cores for running sequential single-threaded workloads.  ...  Trace processors exploit control independence by reusing control-independent traces in the window following misprediction events.  ... 
doi:10.1109/hpca.2011.5749749 dblp:conf/hpca/RobatmiliGBK11 fatcat:mferyc5rybelfl64gftvd6oy5m

80μW/MHz 0.68V Ultra Low-Power Variation-Tolerant Superscalar Dual-Core Application Processor

Youngsu Kwon, Jae-Jin Lee, Kyoung-Seon Shin, Jin-Ho Han, Kyung-Jin Byun, Nak-Woong Eum
2015 IEIE Transactions on Smart Processing and Computing  
The core implements intra-core low-power microarchitecture with minimal performance degradation in instruction fetch, branch prediction, scheduling, and execution units.  ...  Simultaneous single-cycle fetching of instructions in the same cache line enables sporadic activation of I$ resulting in power reduction.  ...  The separation of the Aldebaran core into sub-cores, IF, DEC, and EX enables independent control of instruction flow.  ... 
doi:10.5573/ieiespc.2015.4.2.071 fatcat:72errhgvvzghhggpe5mkcnkfia

Importance of wind conditions, fetch, and water levels on wave-generated shear stresses in shallow intertidal basins

S. Fagherazzi, P. L. Wiberg
2009 Journal of Geophysical Research  
Wiberg (2009), Importance of wind conditions, fetch, and water levels on wave-generated shear stresses in shallow intertidal basins,  ...  Our analysis unravels the interplay of basin morphology, tidal elevation, and wind direction on water depth, fetch, and the resulting wave-generated shear stresses.  ...  This research was supported by the Department of Energy NICCR program award TUL-538-06/07, by NSF through the VCR-LTER program award GA10618 -127104, and by the Office of Naval Research award N00014-07  ... 
doi:10.1029/2008jf001139 fatcat:3werbwjkwfdujeub4ui6ijnfs4

PIM lite

Shyamkumar Thoziyoor, Jay Brockman, Daniel Rinzler
2005 Proceedings of the 15th ACM Great Lakes symposium on VLSI - GLSVSLI '05  
However, any reduction in area provided through pitch-matching has to be viewed in proper perspective.  ...  : In the pair fetch stage, an <FP.IP> pair that is associated with the thread gets fetched from either the thread pool or from a special register called the "critical bin" if the thread is executing in  ... 
doi:10.1145/1057661.1057678 dblp:conf/glvlsi/ThoziyoorBR05 fatcat:4stcg6dg55cj5hyauxaucskwyi
« Previous Showing results 1 — 15 out of 24,399 results