Filters








24,907 Hits in 3.8 sec

Non-consistent dual register files to reduce register pressure

J. Llosa, M. Valero, E. Ayguade
Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture  
Non-consistent dual register files support the bandwidth demands and the high register requirements, without penalizing neither access time nor implementation cost.  ...  This paper presents the non-consastent dual rrgrster file, an alternative implementation and management of the register file.  ...  the non-consistent dual register file.  ... 
doi:10.1109/hpca.1995.386558 dblp:conf/hpca/LlosaVA95 fatcat:di6k4egcbjdcbifaxjm2uw3nwu

A shared reconfigurable VLIW multiprocessor system

Fakhar Anjam, Stephan Wong, Faisal Nadeem
2010 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)  
The results show that we can achieve two times better performance for our dual-processor system (with shared resources) compared to a uni-processor system or a 2-cluster processor system for applications  ...  By utilizing a freely available compiler and simulator in our development framework, we are able to optimize our design and map any application written in C to our multiprocessor system.  ...  To reduce the pressure on the number of read and write ports of the shared register file, a clustered architecture is used.  ... 
doi:10.1109/ipdpsw.2010.5470734 dblp:conf/ipps/AnjamWN10 fatcat:kv7gdsryjfhtve52ntylhep4gm

Experimental analysis of the aeroacoustics of cascaded airfoils

Lori A. Perry
1993 Journal of the Acoustical Society of America  
The struts are likely to decorrelate the local pressure fluctuations along the trailing edge, which reduces the intensity of trailing edge sound production.  ...  The addition of support struts tends to reduce the OASPL.  ...  edge pressure fluctuations, and (3) Apply a goodness-of-fit estimate to smooth the resulti..g G(He) function.  ... 
doi:10.1121/1.406658 fatcat:4ufot274qfccjb4whnjqwxxsv4

Register allocation for fine grain threads on multicore processor

D.C. Kiran, S. Gurunarayanan, Janardan P. Misra, Munish Bhatia
2017 Journal of King Saud University: Computer and Information Sciences  
As each core of a multicore processor has a private register file, it results in reduced register pressure.  ...  To effectively utilize the potential benefits of the multicore processor, the sequential program must be split into small parallel regions to be run on different cores, and the register allocation must  ...  to existing register allocation approaches which construct the global interference graph and then perform simplification to reduce register pressure.  ... 
doi:10.1016/j.jksuci.2015.04.001 fatcat:avp6odbvurdqxftlwq4ebz6dae

Power-Aware Compilation for Register File Energy Reduction

José L. Ayala, Alexander Veidenbaum, Marisa López-Vallejo
2003 International journal of parallel programming  
Total energy consumption in the register file is reduced by 65% with no appreciable performance penalty for MiBench benchmarks on an embedded processor.  ...  Optimal usage of the register file in terms of size is achieved and unused registers are put into a lowpower state.  ...  Current sophisticated compiler optimizations also require larger register files and increase the register pressure.  ... 
doi:10.1023/b:ijpp.0000004510.66751.2e fatcat:nmhxoqj4hvgqjdret5hdihdlde

Effective Code Generation for Distributed and Ping-Pong Register Files: A Case Study on PAC VLIW DSP Cores

Yung-Chia Lin, Chia Han Lu, Chung-Ju Wu, Chung-Lin Tang, Yi-Ping You, Ya-Chaio Moo, Jenq-Kuen Lee
2007 Journal of Signal Processing Systems  
The PAC DSP utilizes port-restricted, distributed, and partitioned register file structures in addition to a heterogeneous clustered data-path architecture to attain low power consumption and a smaller  ...  This paper describes our application of the open research compiler infrastructure to a novel VLIW DSP (known as the PAC DSP core) and the specific design of code generation for its register file architecture  ...  pressure due to data duplication between two different private register files.  ... 
doi:10.1007/s11265-007-0059-4 fatcat:wwhl4a3d4vcurpebj3rwh2triq

Transient fault detection via simultaneous multithreading

Steven K. Reinhardt, Shubhendu S. Mukherjee
2000 SIGARCH Computer Architecture News  
Smaller feature sizes, reduced voltage levels, higher transistor counts, and reduced noise margins make future generations of microprocessors increasingly prone to transient hardware faults.  ...  Third, we identify the need for consistent replication of load values, and propose and evaluate two new mechanisms for satisfying this requirement.  ...  ACKNOWLEDGMENTS We thank Bob Jardine and Alan Wood from Compaq's Tandem Division for our numerous discussions with them on fault tolerance and their encouragement to pursue this research.  ... 
doi:10.1145/342001.339652 fatcat:25sgengr45bkti7tk6udgwx5oe

Transient fault detection via simultaneous multithreading

Steven K. Reinhardt, Shubhendu S. Mukherjee
2000 Proceedings of the 27th annual international symposium on Computer architecture - ISCA '00  
Smaller feature sizes, reduced voltage levels, higher transistor counts, and reduced noise margins make future generations of microprocessors increasingly prone to transient hardware faults.  ...  Third, we identify the need for consistent replication of load values, and propose and evaluate two new mechanisms for satisfying this requirement.  ...  ACKNOWLEDGMENTS We thank Bob Jardine and Alan Wood from Compaq's Tandem Division for our numerous discussions with them on fault tolerance and their encouragement to pursue this research.  ... 
doi:10.1145/339647.339652 fatcat:evwusqy7grb7rlfixtrwy5lehm

A Register File Architecture and Compilation Scheme for Clustered ILP Processors [chapter]

Krishnan Kailas, Manoj Franklin, Kemal Ebcioğlu
2002 Lecture Notes in Computer Science  
Our scheme makes use of a small Caching Register Buffer (CRB) attached to the traditional partitioned local register file, which is used to store copies of remote registers.  ...  Detailed experimental results show that a windowed CRB with just 4 entries provides the same performance as that of a partitioned register file with infinite non-architected register space for keeping  ...  Llosa et al. proposed a dual register file scheme which consists of two replicated, yet not fully consistent register files [21] .  ... 
doi:10.1007/3-540-45706-2_68 fatcat:cw37lzv3svfotfwme3cpyawupy

Compiler directed early register release

T.M. Jones, M.F.R. O'Boyle, J. Abella, A. Gonzalez, O. Ergin
2005 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05)  
This paper presents a novel compiler directed technique to reduce the register pressure and power of the register file by releasing registers early.  ...  This reduces the occupancy of our banked register file, allowing banks to be turned off for power savings. Our scheme is faster, simpler and requires less hardware than recently proposed techniques.  ...  This leads to increased IPC and much reduced register pressure and static/dynamic power.  ... 
doi:10.1109/pact.2005.14 dblp:conf/IEEEpact/JonesOAGE05 fatcat:wcy4bz3ribeznojxb47at26ela

Energy-aware compilation and hardware design for VLIW embedded systems

Jose L. Ayala, Marisa Lopez Vallejo, David Atienza, Praveen Raghavan, Francky Catthoor, Diederik Verkest
2007 International Journal of Embedded Systems  
In this paper, we present a new approach to reduce the energy of shared register files in forthcoming embedded VLIW processors running real-life applications up to 60% without performance penalty.  ...  This approach relies on limited hardware extensions and a compiler-based energy-aware register assignment algorithm to deactivate at run-time parts of the register file (i.e., sub-banks) in an independent  ...  In the last years, several software pipelining strategies to distribute the use of the register file, targeted at reducing memory pressure in VLIW systems, have been outlined Jacome, 2000, 2001) .  ... 
doi:10.1504/ijes.2007.016035 fatcat:dlz4tegnnfekbizhgtsfrjr3ri

Balanced Bipartite Graph Based Register Allocation for Network Processors in Mobile and Wireless Networks

Feilong Tang, Ilsun You, Minyi Guo, Song Guo, Long Zheng
2010 Mobile Information Systems  
Intel's network processor IXP is specially designed for fast packet processing to achieve a broad bandwidth. IXP provides a large number of registers to reduce the number of memory accesses.  ...  In this paper, we investigate an approach for efficiently generating balanced bipartite graph and register allocation algorithms for the dual-bank register allocation in IXPs.  ...  Acknowledgement Feilong Tang would like to thank The Japan Society for the Promotion of Science (JSPS) and The Un-  ... 
doi:10.1155/2010/986192 fatcat:llofit5ce5hq5bg5lqu53v6rmu

Early Periodic Register Allocation on ILP Processors

Sid-Ahmed-Ali TOUATI, Christine EISENBEIS
2004 Parallel Processing Letters  
In this new graph, we are able to fix the register pressure, measured as the number of simultaneously alive variables in any schedule.  ...  After scheduling, register allocation is done on conventional register sets or on rotating register files.  ...  Introduction This article addresses the problem of register pressure in simple loop data dependence graphs (DDGs), with multiple register types and non unit assumed latencies operations.  ... 
doi:10.1142/s012962640400188x fatcat:6glu4jl4b5bwliskv36en2mswa

On the Exploitation of Narrow-Width Values for Improving Register File Reliability

Jie Hu, Shuai Wang, S.G. Ziavras
2009 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
A detailed architectural vulnerability factor (AVF) analysis shows that IRD significantly reduces the AVF from 8.4% in a conventional unprotected register file to 0.1% in an IRD register file.  ...  Since the register file is in the critical path of the processor pipeline, any reliable design that increases either the pressure on the register file or the register file access latency is not desirable  ...  reuse to reduce resource redundancy in the register file.  ... 
doi:10.1109/tvlsi.2009.2017441 fatcat:ryrga5la2nfplaaafqpvrgf4te

Understanding the GPU Microarchitecture to Achieve Bare-Metal Performance Tuning

Xiuxia Zhang, Guangming Tan, Shuangbai Xue, Jiajia Li, Keren Zhou, Mingyu Chen
2017 Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming - PPoPP '17  
The performance boost is achieved by tuning FFMA throughput by activating dual-issue, eliminating register bank conflicts, adding non-FFMA instructions with little penalty, and choosing proper width of  ...  We use SGEMM as a running example to show the ways to achieve bare-metal performance tuning.  ...  Acknowledgments We would like to thank Prof. Mary Hall and other reviewers for the very useful comments and suggestions which help us improve the quality of our paper.  ... 
doi:10.1145/3018743.3018755 fatcat:pueil5biffgtlplus57wybzysa
« Previous Showing results 1 — 15 out of 24,907 results