Filters








37,929 Hits in 9.5 sec

Dynamically reducing pressure on the physical register file through simple register sharing

Liem Tran, N. Nelson, Fung Ngai, S. Dropsho, M. Huang
IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004  
The first technique dynamically combines physical registers having the same value.  ...  Despite the simplicity, our design reduces the required number of physical registers by more than 10% on some applications, and provides almost half of the total benefits of an aggressive (complex) scheme  ...  Our approach is complementary to these approaches in that we reduce the demand of physical registers through sharing.  ... 
doi:10.1109/ispass.2004.1291358 dblp:conf/ispass/TranNNDH04 fatcat:qx7t5ugjurcnpe4ohpd4h43l5e

Exploring the limits of early register release

Timothy M. Jones, Michael F. P. O'Boyle, Jaume Abella, Antonio González, Oğuz Ergin
2009 ACM Transactions on Architecture and Code Optimization (TACO)  
Register pressure in modern superscalar processors can be reduced by releasing registers early and by copying their contents to cheap back-up storage.  ...  On the other hand, compilers have a global view of the program and, using simple dataflow analysis, can determine the last use.  ...  By recycling physical registers much earlier than usual, register pressure is reduced.  ... 
doi:10.1145/1582710.1582714 fatcat:tjlrpgtys5bo7c7tw2j27ouvqa

Compiler directed early register release

T.M. Jones, M.F.R. O'Boyle, J. Abella, A. Gonzalez, O. Ergin
2005 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)  
This paper presents a novel compiler directed technique to reduce the register pressure and power of the register file by releasing registers early.  ...  Upon issuing an instruction with one of these logical registers as a source, the processor knows that there will be no more uses of it and can release the register through checkpointing.  ...  Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)  ... 
doi:10.1109/pact.2005.14 dblp:conf/IEEEpact/JonesOAGE05 fatcat:wcy4bz3ribeznojxb47at26ela

Asymmetrically banked value-aware register files for low-energy and high-performance

Shuai Wang, Hongyan Yang, Jie Hu, Sotirios G. Ziavras
2008 Microprocessors and microsystems  
Our experimental evaluation with SPEC CINT2000 benchmark suite shows that AB-VARF reduces the energy consumption by 78.4% over a conventional register file, on the average, at the cost of a 0.7% performance  ...  register file designs.  ...  Other proposals try to reduce the required register file size for reduced access latency and energy consumption by delaying the physical register allocation [14, 26] , sharing physical registers to exploit  ... 
doi:10.1016/j.micpro.2007.10.004 fatcat:rw6allqxtrduzfqvh4yblqgune

Software-Directed Techniques for Improved GPU Register File Utilization

Dani Voitsechov, Arslan Zulfiqar, Mark Stephenson, Mark Gebhart, Stephen W. Keckler
2018 ACM Transactions on Architecture and Code Optimization (TACO)  
An in-depth evaluation on a large suite of applications shows that just our early register technique outperforms previous work on dynamic register allocation, and together these approaches, on average,  ...  This article seeks to increase the thread occupancy and improve performance of these register-bound applications by making more efficient use of the existing register file capacity.  ...  We also performed a sensitivity study on the size of the scalar register file, reducing it from 4KB to 3KB and 2KB.  ... 
doi:10.1145/3243905 fatcat:j4cejqjwcjerfat42nq5pop774

Efficient resources assignment schemes for clustered multithreaded processors

Fernando Latorre, Jose Gonzalez, Antonio Gonzalez
2008 Proceedings, International Parallel and Distributed Processing Symposium (IPDPS)  
On the other hand, clustering architectures have been widely studied in order to reduce the inherent complexity of current monolithic processors.  ...  On the one hand, exploiting instruction level parallelism is leading us to diminishing returns and therefore exploiting other sources of parallelism like thread level parallelism is needed in order to  ...  Physical register file The other main shared resource where thread starvation occurs is the physical register file.  ... 
doi:10.1109/ipdps.2008.4536226 dblp:conf/ipps/LatorreGG08 fatcat:46m2rbhy3vehfed2pak2lw5ifa

Operand Registers and Explicit Operand Forwarding

J. Balfour, R.C. Halting, W.J. Dally
2009 IEEE computer architecture letters  
An evaluation shows that capturing operand bandwidth close to the function units allows operand registers to reduce the energy consumed in the register files and forwarding network of an embedded processor  ...  Operand register files are small, inexpensive register files that are integrated with function units in the execute stage of the pipeline, effectively extending the pipeline operand registers into register  ...  Furthermore, reference filtering by the operand registers reduces demand for operand bandwidth from the shared general-purpose registers, which allows the number of read ports to the general-purpose register  ... 
doi:10.1109/l-ca.2009.45 fatcat:xqx3yka73fgfvfonvsupg7ax3y

Evaluating the use of register queues in software pipelined loops

G.S. Tyson, M. Smelyanskiy, E.S. Davidson
2001 IEEE transactions on computers  
Using RQs, the compiler can allocate physical registers to store live values in the software pipelined loop while minimizing the pressure placed on architected registers.  ...  Through the use of RQs, we can minimize the register pressure and code expansion caused by software pipelining.  ...  The authors also thank Bob Rau and Alexandre Eichenberger for providing the loop kernels used in this study.  ... 
doi:10.1109/12.946998 fatcat:rsl6b7hforg5zbnxpkrsk2k34u

Evaluating the use of register queues in software pipelined loops

G.S. Tyson, M. Smelyanskiy, E.S. Davidson
2001 IEEE transactions on computers  
Using RQs, the compiler can allocate physical registers to store live values in the software pipelined loop while minimizing the pressure placed on architected registers.  ...  Through the use of RQs, we can minimize the register pressure and code expansion caused by software pipelining.  ...  The authors also thank Bob Rau and Alexandre Eichenberger for providing the loop kernels used in this study.  ... 
doi:10.1109/tc.2001.947006 fatcat:2db3qaphs5fmhfzo66flhxqyyi

Reducing register pressure in SMT processors through L2-miss-driven early register release

Joseph J. Sharkey, Jason Loew, Dmitry V. Ponomarev
2008 ACM Transactions on Architecture and Code Optimization (TACO)  
The register file is one of the most critical datapath components limiting the number of threads that can be supported on a simultaneous multithreading (SMT) processor.  ...  To allow the use of smaller register files without degrading performance, techniques that maximize the efficiency of using registers through aggressive register allocation/deallocation can be considered  ...  Finally, the third set of solutions reduces the number of registers through the use of register sharing [Balakrishan and Sohi 2003].  ... 
doi:10.1145/1455650.1455652 fatcat:54a4w3qoufc5zfxqlwhr6ejgqu

A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors

Mark Gebhart, Daniel R. Johnson, David Tarjan, Stephen W. Keckler, William J. Dally, Erik Lindholm, Kevin Skadron
2012 ACM Transactions on Computer Systems  
register file hierarchy reduces register file energy by 54%.  ...  Second, we propose replacing the monolithic register file found on modern designs with a hierarchical register file.  ...  Acknowledgments We thank the anonymous reviewers and the members of the NVIDIA Architecture Research Group for their comments.  ... 
doi:10.1145/2166879.2166882 fatcat:cwh624dhdbbcffra6mr6kkorgu

NoSQ: Store-Load Communication without a Store Queue

Tingting Sha, Milo M. K. Martin, Amir Roth
2006 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06)  
The primary benefit of NoSQ is a simple, fast datapath that does not contain store-load forwarding hardware; all loads get their values either from the data cache or from the register file.  ...  The primary benefit of NoSQ is a simple, fast datapath that does not contain store-load forwarding hardware; all loads get their values either from the data cache or from the register file.  ...  Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf g721.e Acknowledgments The authors thank their reviewers for their comments and suggestions for improving this  ... 
doi:10.1109/micro.2006.39 dblp:conf/micro/ShaMR06 fatcat:nedxknqu4fhbvimd3fq5aeomxm

NoSQ: Store-Load Communication without a Store Queue

Tingting Sha, Milo M.K. Martin, Amir Roth
2007 IEEE Micro  
Moreover, SMB reduces register file pressure by allowing the definition and the load in a definition-store-load-use chain to share a single physical register.  ...  Extending the commit pipeline might increase pressure on core structures such as the reorder buffer, load and store queues, and register file.  ... 
doi:10.1109/mm.2007.17 fatcat:sbzoag742nbozpqjioodroxzfe

Balancing register allocation across threads for a multithreaded network processor

Xiaotong Zhuang, Santosh Pande
2004 SIGPLAN notices  
To reduce the register needs, move insertions are inserted at program points that split the live ranges or the nodes on the interference graph.  ...  We first estimate the register requirement bounds, then reduce from the upper bound gradually to achieve a good register balance among threads.  ...  The threads on one PU share the computation power of the PU and register files etc. Formally, the model is as follows: 1.  ... 
doi:10.1145/996893.996876 fatcat:5buakzxinje77fzqerf6sbktdi

Balancing register allocation across threads for a multithreaded network processor

Xiaotong Zhuang, Santosh Pande
2004 Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation - PLDI '04  
To reduce the register needs, move insertions are inserted at program points that split the live ranges or the nodes on the interference graph.  ...  We first estimate the register requirement bounds, then reduce from the upper bound gradually to achieve a good register balance among threads.  ...  The threads on one PU share the computation power of the PU and register files etc. Formally, the model is as follows: 1.  ... 
doi:10.1145/996841.996876 dblp:conf/pldi/ZhuangP04 fatcat:yau3ceqfojaynbszkonvddhfiq
« Previous Showing results 1 — 15 out of 37,929 results