Filters








8,710 Hits in 4.7 sec

Design and Analysis of On-Chip CPU Pipelined Caches [chapter]

C. Ninos, H. T. Vergos, D. Nikolos
2000 IFIP Advances in Information and Communication Technology  
The only way to reduce the effect of cache access time on processor cycle time is the use of pipelined caches. A timing model for on-chip caches has recently been presented in [1].  ...  The speedup of the pipelined cache against the non-pipelined one is examined as a function of the pipeline depth, the organization and the physical implementation parameters.  ...  Pipe\ined caches can effectively reduce the cyc\e time ofthe CPU. In this paper the design and analysis of pipelined CPU caches was studied.  ... 
doi:10.1007/978-0-387-35498-9_15 fatcat:6ookn5cndja4jpwtefs6kukw3m

Multilevel optimization of pipelined caches

K. Olukotun, T.N. Mudge, R.B. Brown
1997 IEEE transactions on computers  
The results of this design exercise show that, because processors with pipelined caches can have shorter CPU cycle times and larger caches, a significant performance advantage is gained by using two or  ...  The solution combines trace-driven architectural simulations and the timing analysis of the physical implementation of the cache.  ...  More recently, functions for a variety of on-chip cache organizations have been developed [21] that could be used for a similar analysis of on-chip caches.  ... 
doi:10.1109/12.628394 fatcat:3cfie5jlzze6pglaepb2cw76iq

The NVAX and NVAX+ High-Performance VAX Microprocessor

G. Michael Uhler, Debra Bernstein, Larry L. Biro, John F. Brown III, John H. Edmondson, Jeffrey D. Pickholtz, Rebecca L. Stamm
1992 Digital technical journal of Digital Equipment Corporation  
The design evolved throughout the project as timeto-market, performance, and complexity trade-offs were made. Special design features address the issues of debug, maintenance, and analysis.  ...  The NVAX and NVAX+ CPU chips are high-performance VAX microprocessors that use techniques traditionally associated with RISC microprocessor designs to dramatically improve VAX performance.  ...  Number of CPU Chips The VAX 6000 Model 400 core CPU implementation is a three-chip design: a processor chip, with a small on-chip primary cache; a floating-point chip; and a secondary cache controller,  ... 
dblp:journals/dtj/UhlerBBBEPS92 fatcat:pqjj2y5igvcr7mszriq4z35iym

A Comparative Study of Heterogeneous Processor Simulators

Shagufta S., Muhammad Aleem, Muhammad Arshad, Muhammad Azhar
2016 International Journal of Computer Applications  
With the addition of more transistors on a single-chip, a processor's energy consumption increases exponentially. The solution to this problem is heterogeneous processors and machines.  ...  In this study, we present a detailed comparative analysis of gem5-gpu, gem5, and multi2sim simulators.  ...  The on-chip memory is joined with the processor on the same chip i.e., instruction cache, data cache and on-chip SRAM.  ... 
doi:10.5120/ijca2016911316 fatcat:t7532ev45nhu7m3pt4r4fwqjna

UltraSPARC-II/: expanding the boundaries of a system on a chip

K.B. Normoyle, M.A. Csoppenszky, A. Tzeng, T.P. Johnson, C.D. Furman, J. Mostoufi
1998 IEEE Micro  
Stated differently, the CPU must deliver a lot of performance for the least impact on overall system cost and also enable simplified system design.  ...  T he central mission of the UltraSPARC-IIi is optimized price/performance and ease of use for the system designer.  ...  Acknowledgments Tremendous thanks and appreciation go to the entire UltraSPARC-IIi project team. Without the hard work of many people, UltraSPARC-IIi would not exist today.  ... 
doi:10.1109/40.671399 fatcat:sc3kknyvbfgmlk5rpda55k4bji

A 20-MIPS sustained 32-bit CMOS microprocessor with high ratio of sustained to peak performance

N.P. Jouppi, J.Y.-F. Tang
1989 IEEE Journal of Solid-State Circuits  
We do work in the design, fabrication and packaging of hardware; language processing and scaling issues in system software design; and the exploration of new applications areas that are opening up with  ...  Our focus is computer science research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems.  ...  Bob Alverson, Scott Nettles, and Don Stark provided CAD tools instrumental in the design of the CPU. Jeremy Dion developed the functional simulator of the CPU.  ... 
doi:10.1109/jssc.1989.572612 fatcat:ap73okvntrb6hg63afkvplylqi

Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU

William J. Bowhill, Shane L. Bell, Bradley J. Benschneider, Andrew J. Black, Sharon M. Britton, Ruben W. Castelino, Dale R. Donchin, John H. Edmondson, Harry R. Fair III, Paul E. Gronowski, Anil K. Jain, Patricia L. Kroesen (+9 others)
1995 Digital technical journal of Digital Equipment Corporation  
A 300-MHz, custom 64-bit VLSI, second-generation Alpha CPU chip has been developed. The chip was designed in a 0.5-um CMOS technology using four levels of metal.  ...  It contains an 8-KB instruction cache; an 8-KB data cache; and a 96-KB unified second-level cache. The chip can issue four instructions per cycle and delivers 1,200 mips/600 MFLOPS (peak).  ...  ACKNOWLEDGMENTS We would like to acknowledge the contributions of many people who helped make this chip possible.  ... 
dblp:journals/dtj/BowhillBBBBCDEFGJKLLMMPSSST95 fatcat:sk6wveyqwrbd7kdy2a5twmdnve

Simultaneous multithreading support in embedded distributed memory MPSoCs

Rafael Garibotti, Luciano Ost, Remi Busseuil, Mamady kourouma, Chris Adeniyi-Jones, Gilles Sassatelli, Michel Robert
2013 Proceedings of the 50th Annual Design Automation Conference on - DAC '13  
2.6.38 § 16kB private L1 data and instruction caches § 32bits channel width and DDR physical MJPEG video pipeline Results § Analysis on average bandwidth for MJPEG: § RMA bandwidth is the bottleneck  ...  I$ D$ L1 cache CTRL L2 cache CTRL L2 cache CTRL Memory CTRL (b) Results § Analysis on performance scalability : § ARM SMP: 1 to 8 CPUs interconnected by a bus / NoC (GEM5) § OpenScale  ... 
doi:10.1145/2463209.2488836 dblp:conf/dac/GaribottiOBkASR13 fatcat:vvmgs57zpzgw5an5clmyhpdo3m

A 200-MHz 64-bit Dual-Issue CMOS Microprocessor

Daniel W. Dobberpuhl, Richard T. Witek, Randy L. Allmon, Robert Anglin, David Bertucci, Sharon M. Britton, Linda Chao, Robert A. Conrad, Daniel E. Dever, Bruce Gieseke, Soha Hassoun, Gregory W. Hoeppner (+11 others)
1992 Digital technical journal of Digital Equipment Corporation  
The metal structure is designed to support the high operating frequency of the chip. Metal 3 is very thick and has a relatively large pitch.  ...  The chip includes an 8-kilobyte (KB) Icache, 8KB D-cache and two associated translation buffers, a four-entry, 32-byte-per-entry write buffer, a pipelined 64-bit integer execution unit with a 32-entry  ...  This initial set of instructions is used to write the bus control registers inside the CPU chip to set the cache timing and to test the chip and module from the CPU out.  ... 
dblp:journals/dtj/DobberpuhlWAABBCCDGHHKLLMMMMPRSS92 fatcat:7stblq65tzabfajo3o443uuygy

Architectural and organizational tradeoffs in the design of the MultiTitan CPU

N. P. Jouppi
1989 SIGARCH Computer Architecture News  
We do work in the design, fabrication and packaging of hardware; language processing and scaling issues in system software design; and the exploration of new applications areas that are opening up with  ...  Our focus is computer science research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems.  ...  If a separate FPU chip has its own set of registers, then loads and stores to the FPU chip can take place directly from the data cache, with the same latency as an FPU on the CPU chip.  ... 
doi:10.1145/74926.74957 fatcat:cib7kgqhyncyvaku3b6hsbdd6y

Architectural and organizational tradeoffs in the design of the MultiTitan CPU

N. P. Jouppi
1989 Proceedings of the 16th annual international symposium on Computer architecture - ISCA '89  
We do work in the design, fabrication and packaging of hardware; language processing and scaling issues in system software design; and the exploration of new applications areas that are opening up with  ...  Our focus is computer science research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems.  ...  If a separate FPU chip has its own set of registers, then loads and stores to the FPU chip can take place directly from the data cache, with the same latency as an FPU on the CPU chip.  ... 
doi:10.1145/74925.74957 dblp:conf/isca/Jouppi89 fatcat:3kobxt37tfdg7igbosjwt6pnxy

A 200-MHz 64-b dual-issue CMOS microprocessor

D.W. Dobberpuhl, R.T. Witek, R. Allmon, R. Anglin, D. Bertucci, S. Britton, L. Chao, R.A. Conrad, D.E. Dever, B. Gieseke, S.M.N. Hassoun, G.W. Hoeppner (+11 others)
1992 IEEE Journal of Solid-State Circuits  
A 400-MIPS/200-MFLOPS (peak) custom 64-b VLSI CPU chip is described. The chip is fabricated in a 0.75pm CMOS technology utilizing three levels of metalization and optimized for 3.3-V operation.  ...  The chip includes separate 8-kilobyte instruction and data caches and a fully pipelined floating-point unit (FPU) that can handle both IEEE and VAX standard floating-point data types.  ...  This initial set of instructions is used to write the bus control registers inside the CPU chip to set the cache timing and to test the chip and module from the CPU out.  ... 
doi:10.1109/4.165336 fatcat:zugbyrtgobdzvdiyotfuck5huq

SPARC64™ VIIIfx: Fujitsu's New Generation Octo Core Processor for PETA Scale computing

Takumi Maruyama
2010 IEEE Micro  
The three CPU cores and an 8-Mbyte block of L3 cache are grouped as one last level cache and core unit (LCU).  ...  With four LCUs, the SPARC64 XII chip includes a total of 12 cores and 32 Mbytes of L3 cache.  ...  SPARC64 XII PROCESSOR STRUCTURE OUTSIDE THE CORE The three CPU cores and an 8-Mbyte block of L3 cache are grouped as one LCU, as shown in Figure 3 .  ... 
doi:10.1109/mm.2010.4 fatcat:pjrsck7cljhz7ont5anxuprxu4

Performance monitoring in advanced computer architecture

Richard J. Enbody, Kelley Pellini, William Moore
1998 Proceedings of the 1998 workshop on Computer architecture education - WCAE '98  
The current generation of microprocessors has performance monitoring registers on chip which can be read by users.  ...  The result is real-time monitoring of processor performance, and a new opportunity for computer architecture education.  ...  Conclusion Performance monitors can be used in multiple ways; hardware design and analysis, software design and analysis, and as an educational tool.  ... 
doi:10.1145/1275182.1275199 dblp:conf/wcae/EnbodyPM98 fatcat:qucjzv7hrba6hhr5eudcbotr3a

Best-Effort FPGA Programming: A Few Steps Can Go a Long Way [article]

Jason Cong, Zhenman Fang, Yuchen Hao, Peng Wei, Cody Hao Yu, Chen Zhang, Peipei Zhou
2018 arXiv   pre-print
However, reading through recent publications on FPGA designs using HLS, one often gets the impression that FPGA programming is still hard in that it leaves programmers to explore a very large design space  ...  Moreover, we show that the refinement steps in the best-effort guideline, consisting of explicit data caching, customized pipelining, processing element duplication, computation/communication overlapping  ...  On one hand, the performances between the 64KB, 1MB and "infinite" groups are almost identical, which is consistent with our previous analysis on the design choice of caching size.  ... 
arXiv:1807.01340v1 fatcat:6ocpzvp2cvgkninbtyvvyk7yiu
« Previous Showing results 1 — 15 out of 8,710 results