Filters








504 Hits in 6.6 sec

Stretching the Limits of Clock-Gating Efficiency in Server-Class Processors

H. Jacobson, P. Bose, Zhigang Hu, A. Buyuktosunoglu, V. Zyuban, R. Eickemeyer, L. Eisen, J. Griswell, D. Logan, B. Sinharoy, J. Tendler
11th International Symposium on High-Performance Computer Architecture  
In this paper we first examine the realistic benefits and limits of clock-gating in current generation high-performance processors (e.g. of the POWER4 T M or POWER5 T M class).  ...  Based on our experiences with current designs, we try to bound the practical limits of clock gating efficiency in future microprocessors.  ...  This case is explored here in the context of a floating point unit of a current generation server class microprocessor.  ... 
doi:10.1109/hpca.2005.33 dblp:conf/hpca/JacobsonBHBZEEGLST05 fatcat:7qojwvbt4fbxxgia2hhqpffexe

"Timing closure by design," a high frequency microprocessor design methodology

S. Posluszny, K. Lee, D. Meltzer, K. Nowka, J. Park, J. Peter, J. Silberman, O. Takahashi, P. Villarrubia, N. Aoki, D. Boerstler, P. Coulman (+5 others)
2000 Proceedings of the 37th conference on Design automation - DAC '00  
Characteristics of "Timing Closure by Design" are 1) logic partitioned on timing boundaries, 2) predictable control structures (PLAs), 3) static interfaces for dynamic circuits, 4) low skew clock distribution  ...  This methodology was used to design a Gigahertz class PowerPC microprocessor with 19 million transistors.  ...  A previous 1.0 Gigahertz integer processor [2, 3, 7] was built using many of the same concepts described in this paper.  ... 
doi:10.1145/337292.337749 dblp:conf/dac/PoslusznyABCDFHKKLMNPPSTV00 fatcat:s7noxkmoyrhwpljfqll7pwqdka

Specification and analysis of power-managed systems

A. Bogliolo, L. Benini, E. Lattanzi, G. De Micheli
2004 Proceedings of the IEEE  
The structure of these DESs is specified in terms of physical states (representing operation modes) and events (triggering state transitions), while system behavior is specified in terms of next-event  ...  the Intel Xscale processor architecture, a multitasking real-time system, and a sensor network.  ...  ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for comments and suggestions.  ... 
doi:10.1109/jproc.2004.831207 fatcat:oo26pcuqdvbjda22s4dn6gjaci

Evaluation of the Raw Microprocessor

Michael Bedford Taylor, James Psota, Arvind Saraf, Nathan Shnidman, Volker Strumpen, Matt Frank, Saman Amarasinghe, Anant Agarwal, Walter Lee, Jason Miller, David Wentzlaff, Ian Bratt (+4 others)
2004 SIGARCH Computer Architecture News  
In contrast, it is now well known that the delay of the interconnect inside traditional microprocessors limits scalability [36, 1, 15, 38, 45] .  ...  Our evaluation attempts to determine the extent to which Raw succeeds in meeting its goal of serving as a more versatile, general-purpose processor.  ...  We chose a server farm with 16 P3s as our best-in-class server system. Notice that a single-chip Raw system comes within a factor of three of this server farm for most applications.  ... 
doi:10.1145/1028176.1006733 fatcat:rdy5winvrjdlvawhqj6wgiozai

Effect of increasing chip density on the evolution of computer architectures

R. Nair
2002 IBM Journal of Research and Development  
However, the efficiency of use of transistors in this manner is not high.  ...  The second involves integration on the same chip of varied structures such as processors, DRAM, sensors, and transducers, which in the past required different processing capabilities-commonly referred  ...  Acknowledgment The author wishes to thank Jim Smith, Monty Denneau, Eric Kronstadt, and Dan Prener for many useful discussions and for valuable feedback on earlier versions of this manuscript.  ... 
doi:10.1147/rd.462.0223 fatcat:th2tvpf7ajfrxmfgs4mjznkud4

The PowerNap Server Architecture

David Meisner, Brian T. Gold, Thomas F. Wenisch
2011 ACM Transactions on Computer Systems  
high conversion efficiency across the entire range of PowerNap's power demands.  ...  Much of this energy is wasted in idle systems: in typical deployments, server utilization is below 30%, but idle servers still consume 60% of their peak power draw.  ...  assistance in collecting the Cluster utilization trace, Laura Falk for assistance in collecting the departmental server utilization  ... 
doi:10.1145/1925109.1925112 fatcat:pku3zhsd65fydjdosq3e4pwuai

In-Datacenter Performance Analysis of a Tensor Processing Unit [article]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao (+59 others)
2017 arXiv   pre-print
We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters.  ...  This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN).  ...  We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters.  ... 
arXiv:1704.04760v1 fatcat:btodsh4crratffycyq2frubd44

Simulation-based HW/SW co-debugging for field-programmable systems-on-chip

Ruediger Willenberg, Paul Chow
2013 2013 23rd International Conference on Field programmable Logic and Applications  
This enables free-roaming investigation of hardware-software interactions inside the system, including reverting back to an earlier point in simulation time.  ...  SimXMD is open source, and its modular architecture facilitates extension to other embedded processors as well as different simulators and debuggers.  ...  does not work for the Xilinx SDK debugger because of limitations in Eclipse's command-line interface.  ... 
doi:10.1109/fpl.2013.6645542 dblp:conf/fpl/WillenbergC13 fatcat:uwptubuia5bafneowghochz5hq

ThermOS: System support for dynamic thermal management of chip multi-processors

Filippo Sironi, Martina Maggio, Riccardo Cattaneo, Giovanni F. Del Nero, Donatella Sciuto, Marco D. Santambrogio
2013 Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques  
Constraining the temperature of computing systems has become a dominant aspect in the design of integrated circuits.  ...  This results in an increased number of transistors per unit of area and hence a growing power density.  ...  This work was partially supported by the Swedish Research Council through the LCCC Linnaeus Center.  ... 
doi:10.1109/pact.2013.6618802 dblp:conf/IEEEpact/SironiMCNSS13 fatcat:hfdq7qpvwnglpn7oiuhi7vrawq

Limitations and challenges of computer-aided design technology for CMOS VLSI

R.E. Bryant, Kwang-Ting Cheng, A.B. Kahng, K. Keutzer, W. Maly, R. Newton, L. Pileggi, J.M. Rabaey, A. Sangiovanni-Vincentelli
2001 Proceedings of the IEEE  
While manufacturing technology faces fundamental limits inherent in physical laws or material properties, design technology faces fundamental limitations inherent in the computational intractability of  ...  In this paper, we explore limitations to how design technology can enable the implementation of single-chip microelectronic systems that take full advantage of manufacturing technology with respect to  ...  complex part types (at the limits of Moore's Law or the ITRS), with many more energy-power cost-efficient, medium-complexity chips [O(10-100 M) gates in 50-nm technology], working concurrently to implement  ... 
doi:10.1109/5.915378 fatcat:jocv62sorfbnjp53u7b76j4mdi

Heterogeneous Multi-core Architectures

Tulika Mitra
2015 IPSJ Transactions on System LSI Design Methodology  
But the failure of Dennard scaling has brought the computing community to a crossroad where power has become the major limiting factor.  ...  This article presents an overview of the state-of-the-art in heterogeneous multi-core landscape.  ...  Examples of commercial customizable processors include Tensilica Xtensa core [43] and Stretch software configurable processor [4] The primary challenge in processor customization is to automate the  ... 
doi:10.2197/ipsjtsldm.8.51 fatcat:wgiuptlmvvgnhdt2bjrcio6oqi

Toward a multiple clock/voltage island design style for power-aware processors

E. Talpes, D. Marculescu
2005 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
the proper granularity for the choice of voltage/frequency islands in case of superscalar, out-of-order processors.  ...  Enabled by the continuous advancement in fabrication technology, present-day synchronous microprocessors include more than 100 million transistors and have clock speeds well in excess of the 1-GHz mark  ...  ACKNOWLEDGMENT The authors would like to thank Anoop Iyer for his contribution to the initial version of the GALS simulation environment.  ... 
doi:10.1109/tvlsi.2005.844305 fatcat:7azarbcc4rbzfbimlassepc5ki

The MINOS data acquisition system

A. Belias, G.J. Crone, E.F. Harris, C. Howcroft, S. Madani, T.C. Nicholls, G.F. Pearce, D.E. Reyna, N. Tagg, M.A. Thomson
2004 IEEE Transactions on Nuclear Science  
We present the design of the DAQ system and report on experience gathered during early operation of the experiment.  ...  Data are read from the untriggered front-end electronics by VME single board computers and transferred across high-speed PCI data links for consolidation by data routing processors.  ...  MHz, is limited by the performance of the PCI bus in the PC architecture.  ... 
doi:10.1109/tns.2004.828518 fatcat:5soj4vdehzbsdc65hgarvr67lq

The MINOS data acquisition system

A. Belias, G.J. Crone, E. FaIk Harris, C. Howcroft, S. Madani, T.C. Nicholls, G.F. Pearce, D.E. Reyna, N. Tagg, M.A. Thomson
2003 2003 IEEE Nuclear Science Symposium. Conference Record (IEEE Cat. No.03CH37515)  
We present the design of the DAQ system and report on experience gathered during early operation of the experiment.  ...  Data are read from the untriggered front-end electronics by VME single board computers and transferred across high-speed PCI data links for consolidation by data routing processors.  ...  MHz, is limited by the performance of the PCI bus in the PC architecture.  ... 
doi:10.1109/nssmic.2003.1352198 fatcat:vra6szge7zgarhnkmmmvwh4lbi

Workload-Aware Opportunistic Energy Efficiency in Multi-FPGA Platforms [article]

Sahand Salamat, Behnam Khaleghi, Mohsen Imani, Tajana Rosing
2019 arXiv   pre-print
This is in contrast to, and more efficient than, conventional approaches that merely scale (i.e., power-gate) the computing nodes or frequency.  ...  In this paper, we propose an efficient framework to throttle the power consumption of multi-FPGA platforms by dynamically scaling the voltage and hereby frequency during runtime according to prediction  ...  Moshovos's group from University of Toronto for providing the source codes of Stripes and Proteus.  ... 
arXiv:1908.06519v2 fatcat:k5v7zkkplnctdjxg6kkrdxreie
« Previous Showing results 1 — 15 out of 504 results