154 Hits in 6.8 sec

Using internal redundant representations and limited bypass to support pipelined adders and register files

M.D. Brown, Y.N. Patt
Proceedings Eighth International Symposium on High Performance Computer Architecture  
Pipelined functional units and multi-cycle register files may require multi-level bypass networks to guarantee that an instruction's result is available any cycle after it is produced.  ...  This paper evaluates the use of redundant binary and pipelined 2's complement adders in out-of-order execution cores.  ...  We would also like to thank Andy Glew, Shih-Lien Lu, and Chris Wilkerson for their valuable discussions. This work was supported in part by Intel and IBM.  ... 
doi:10.1109/hpca.2002.995718 dblp:conf/hpca/BrownP02 fatcat:petjpr522jblpp2jahwyhon65e

RENO: a rename-based instruction optimizer

V. Petric, T. Sha, A. Roth
2005 32nd International Symposium on Computer Architecture (ISCA'05)  
Alternatively, because eliminated instructions do not consume issue queue entries, physical registers, or issue, bypass, register file, and execution bandwidth, RENO can be used to absorb the performance  ...  Alternatively, because eliminated instructions do not consume issue queue entries, physical registers, or issue, bypass, register file, and execution bandwidth, RENO can be used to absorb the performance  ...  Anne Bracy, Milo Martin, and Marci McCoy Roth helped improve the final manuscript. This work was supported by NSF CAREER award CCF-0238203.  ... 
doi:10.1109/isca.2005.43 dblp:conf/isca/X05b fatcat:u5bx5cpiabetvefjehap64ft4e

A Case for Superconducting Accelerators [article]

Swamit S. Tannu, Poulami Das, Michael L. Lewis, Robert Krick, Douglas M. Carmean, Moinuddin K. Qureshi
2019 arXiv   pre-print
While JJ-based circuits can provide high operating frequency and energy-efficiency, this technology faces three critical challenges: limited device density and lack of area-efficient technology for memory  ...  In this paper, we study the use of superconducting technology to build an accelerator for SHA-256 engines commonly used in Bitcoin mining applications.  ...  ACKNOWLEDGMENTS We thank Srilatha Manne, Elnaz Ansari, Zachary Myers for the technical discussions and feedback. This work was supported by a gift from Microsoft Research.  ... 
arXiv:1902.04641v2 fatcat:eqvbciqawnbi7mcnjghrz5hf5e

IBM POWER6 accelerators: VMX and DFU

L. Eisen, J. W. Ward, H.-W. Tast, N. Mading, J. Leenstra, S. M. Mueller, C. Jacobi, J. Preiss, E. M. Schwarz, S. R. Carlough
2007 IBM Journal of Research and Development  
The RU is used to maintain a shadow copy of the ÓCopyright 2007 by International Business Machines Corporation.  ...  In the VMX floorplan, the VRFs are placed close to their corresponding pipeline; this allows for a fast register file access. Loads and writes update both copies of the VRF.  ... 
doi:10.1147/rd.516.0663 fatcat:vlcskatcbzdxneysa6tn3hydhm

Formal Verification of an Iterative Low-Power x86 Floating-Point Multiplier with Redundant Feedback

Peter-Michael Seidel
2011 Electronic Proceedings in Theoretical Computer Science  
The multiplier operates iteratively and feeds back intermediate results in redundant representation.  ...  It supports x87 and SSE instructions in various precisions and can block the issuing of new instructions.  ...  The second and third pipeline stages consist of combined addition and rounding followed by result selection, formatting for different precisions, and forwarding of the result to the register file and bypass  ... 
doi:10.4204/eptcs.70.6 fatcat:2aln6b65vjbpxmjr5vfsj2nkjm

Low-fat pointers

Albert Kwon, Udit Dhawan, Jonathan M. Smith, Thomas F. Knight, Andre DeHon
2013 Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security - CCS '13  
This, in turn, allows the pointers to be used as capabilities to facilitate fine-grained access control and fast security domain crossing.  ...  , floating-point adder.  ...  PIPELINING AND BYPASSING The SAFElite has four pipeline stages as shown in Fig. 2 .  ... 
doi:10.1145/2508859.2516713 dblp:conf/ccs/KwonDSKD13 fatcat:42xz3xkokrf3zpvejd633tydlq

Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support

Dimitri Tan, Carl E. Lemonds, Michael J. Schulte
2009 IEEE transactions on computers  
The FPM efficiently supports multiple precisions using an iterative rectangular multiplier.  ...  The iterative FPM also supports division, square-root, and transcendental functions.  ...  their excellent work on the multiplier and rounding circuitry implementation, Raj Desikan for his excellent work on the performance modeling and analysis, and to the anonymous reviewers for their helpful  ... 
doi:10.1109/tc.2008.203 fatcat:vbpadgopazf3pe4lm76inknk2y

Exploiting partial operand knowledge

B.R. Mestan, M.H. Lipasti
2003 2003 International Conference on Parallel Processing, 2003. Proceedings.  
We find that a bit-slice design using two 16bit slices achieves IPC within 1% of an ideal design and attains a 16% speed-up over a conventional pipelined design not using partial operands. add R3,R2,R1  ...  addi R3,R3,4 lw R4, 0(R3) beq R5,R4, t sub R5,R5,R1 addi R3,R3,4 add R3,R2,R1 lw R4, 0(R3) beq R5,R4, t sub R5,R5,R1 (a) Non-pipelined Execution Stage Dependent Instructions Observe End-to-End Latency  ...  Acknowledgements This work was supported in part by the National Science Foundation with grants CCR-0073440, CCR-0083126, EIA-0103670, and CCR-0133437, and generous financial support and equipment donations  ... 
doi:10.1109/icpp.2003.1240601 dblp:conf/icpp/MestanL03 fatcat:6zzfyoulsre4tcu32xu6mgeloi

Low-Power, High-Performance TTA Processor for 1024-Point Fast Fourier Transform [chapter]

Teemu Pitkänen, Risto Mäkinen, Jari Heikkinen, Tero Partanen, Jarmo Takala
2006 Lecture Notes in Computer Science  
The proposed processor supports three different transform lengths by bypassing the input to the correct pipeline stage.  ...  Related Work Digital signal processors offer flexibility and, therefore, low development costs but at the expense of limited performance and typically high power dissipation.  ...  Acknowledgement This work has been supported by the Academy of Finland under project 205743 and the National Technology Agency of Finland under research funding decision 40153/05.  ... 
doi:10.1007/11796435_24 fatcat:pkf5fobe4rdwndg5k3cja54lua

End-to-end register data-flow continuous self-test

Javier Carretero, Pedro Chaparro, Xavier Vera, Jaume Abella, Antonio González
2009 Proceedings of the 36th annual international symposium on Computer architecture - ISCA '09  
MOB data and addresses, register file logic, register file storage and functional units.  ...  The structures protected include the issue queue logic and the data associated (i.e., tags, control signals), input multiplexors, rename data, replay logic, register free list, bypasses data and logic,  ...  Acknowledgments We would like to thank Sorin Iacobovici and Abhijit Jas from Intel for the interesting and thorough discussions while we elaborated this work.  ... 
doi:10.1145/1555754.1555770 dblp:conf/isca/CarreteroCVAG09 fatcat:glv7oi6s7rdupcdhqyjbc2vb6y

Exploring circuit timing-aware language and compilation

Giang Hoang, Robby Bruce Findler, Russ Joseph
2011 Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems - ASPLOS '11  
While there has been growing interest in systems that leverage circuit-level timing speculation to improve the performance and power-efficiency of processors, most of the innovation has been at the microarchitectural  ...  By adjusting the design of the ISA and enabling circuit timingsensitive optimizations in a compiler, we can more effectively exploit timing speculation.  ...  Acknowledgments We would like to thank anonymous reviewers and Antonio Gonzalez for their helpful comments. This work is in part supported by NSF CAREER CCF-0644332 and NSF CNS-0720820.  ... 
doi:10.1145/1950365.1950405 dblp:conf/asplos/HoangFJ11 fatcat:neio2onoc5efxiillofquxcgb4

A 200-MHz 64-b dual-issue CMOS microprocessor

D.W. Dobberpuhl, R.T. Witek, R. Allmon, R. Anglin, D. Bertucci, S. Britton, L. Chao, R.A. Conrad, D.E. Dever, B. Gieseke, S.M.N. Hassoun, G.W. Hoeppner (+11 others)
1992 IEEE Journal of Solid-State Circuits  
The chip includes separate 8-kilobyte instruction and data caches and a fully pipelined floating-point unit (FPU) that can handle both IEEE and VAX standard floating-point data types.  ...  It is designed to execute two instructions per cycle among scoreboarded integer, floatingpoint, address, and branch execution units. Power dissipation is 30 W at 200-MHz operation.  ...  -entry register file, and a pipelined floating-point unit (FPU) with an additional 32 registers.  ... 
doi:10.1109/4.165336 fatcat:zugbyrtgobdzvdiyotfuck5huq

An innovative low-power high-performance programmable signal processor for digital communications

J. H. Moreno, V. Zyuban, U. Shvadron, F. D. Neeser, J. H. Derby, M. S. Ware, K. Kailas, A. Zaks, A. Geva, S. Ben-David, S. W. Asaad, T. W. Fox (+4 others)
2003 IBM Journal of Research and Development  
These are applications where the fundamental limiting factor is the power available to support electronics, but where performance requirements (in terms of instructions executed per second MIPS) are also  ...  We describe the methodology used in the development of the processor, highlighting the techniques deployed to enable architecture/compiler/implementation co-development, and the approach used for power-performance  ...  Instruction selection The internal representation of the compiler uses primitive operations similar to those found in RISC processors.  ... 
doi:10.1147/rd.472.0299 fatcat:llzoroyazfawpdigtd7wts4usu

Multi-granular Arithmetic in a Coarse-Grain Reconfigurable Architecture

Stef Louwers, Luc Waeijen, Mark Wijtvliet, Ruud Koolen, Henk Corporaal
2016 2016 Euromicro Conference on Digital System Design (DSD)  
If this width is too narrow, not all operations are natively possible, and software support is required to calculate larger operations.  ...  Using a silicon synthesis-toolflow analysis, we demonstrate the ability to perform a narrow multiplication at an energy cost 15 times lower than the native alternative under realistic conditions, with  ...  The functional units are not internally pipelined; there is only an output register.  ... 
doi:10.1109/dsd.2016.98 dblp:conf/dsd/LouwersWWKC16 fatcat:kp4i52q62rclpjbk2rsr3jkf2a

Microarchitectural Transformations Using Elasticity

Marc Galceran-Oms, Alexander Gotmanov, Jordi Cortadella, Mike Kishinevsky
2011 ACM Journal on Emerging Technologies in Computing Systems  
This article reveals how elasticity can be effectively and practically used to derive pipelined circuits by using correct-byconstruction transformations that can be fully automated.  ...  Pipelining is one of the classical techniques to improve the throughput of a circuit.  ...  and then to the adder.  ... 
doi:10.1145/2043643.2043648 fatcat:66od7s6xcrf6hfktske3ztjyly
« Previous Showing results 1 — 15 out of 154 results