A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Design and Analysis of On-Chip CPU Pipelined Caches
[chapter]
2000
IFIP Advances in Information and Communication Technology
The only way to reduce the effect of cache access time on processor cycle time is the use of pipelined caches. A timing model for on-chip caches has recently been presented in [1]. ...
The speedup of the pipelined cache against the non-pipelined one is examined as a function of the pipeline depth, the organization and the physical implementation parameters. ...
Pipe\ined caches can effectively reduce the cyc\e time ofthe CPU. In this paper the design and analysis of pipelined CPU caches was studied. ...
doi:10.1007/978-0-387-35498-9_15
fatcat:6ookn5cndja4jpwtefs6kukw3m
Multilevel optimization of pipelined caches
1997
IEEE transactions on computers
The results of this design exercise show that, because processors with pipelined caches can have shorter CPU cycle times and larger caches, a significant performance advantage is gained by using two or ...
The solution combines trace-driven architectural simulations and the timing analysis of the physical implementation of the cache. ...
More recently, functions for a variety of on-chip cache organizations have been developed [21] that could be used for a similar analysis of on-chip caches. ...
doi:10.1109/12.628394
fatcat:3cfie5jlzze6pglaepb2cw76iq
The NVAX and NVAX+ High-Performance VAX Microprocessor
1992
Digital technical journal of Digital Equipment Corporation
The design evolved throughout the project as timeto-market, performance, and complexity trade-offs were made. Special design features address the issues of debug, maintenance, and analysis. ...
The NVAX and NVAX+ CPU chips are high-performance VAX microprocessors that use techniques traditionally associated with RISC microprocessor designs to dramatically improve VAX performance. ...
Number of CPU Chips The VAX 6000 Model 400 core CPU implementation is a three-chip design: a processor chip, with a small on-chip primary cache; a floating-point chip; and a secondary cache controller, ...
dblp:journals/dtj/UhlerBBBEPS92
fatcat:pqjj2y5igvcr7mszriq4z35iym
A Comparative Study of Heterogeneous Processor Simulators
2016
International Journal of Computer Applications
With the addition of more transistors on a single-chip, a processor's energy consumption increases exponentially. The solution to this problem is heterogeneous processors and machines. ...
In this study, we present a detailed comparative analysis of gem5-gpu, gem5, and multi2sim simulators. ...
The on-chip memory is joined with the processor on the same chip i.e., instruction cache, data cache and on-chip SRAM. ...
doi:10.5120/ijca2016911316
fatcat:t7532ev45nhu7m3pt4r4fwqjna
UltraSPARC-II/: expanding the boundaries of a system on a chip
1998
IEEE Micro
Stated differently, the CPU must deliver a lot of performance for the least impact on overall system cost and also enable simplified system design. ...
T he central mission of the UltraSPARC-IIi is optimized price/performance and ease of use for the system designer. ...
Acknowledgments Tremendous thanks and appreciation go to the entire UltraSPARC-IIi project team. Without the hard work of many people, UltraSPARC-IIi would not exist today. ...
doi:10.1109/40.671399
fatcat:sc3kknyvbfgmlk5rpda55k4bji
A 20-MIPS sustained 32-bit CMOS microprocessor with high ratio of sustained to peak performance
1989
IEEE Journal of Solid-State Circuits
We do work in the design, fabrication and packaging of hardware; language processing and scaling issues in system software design; and the exploration of new applications areas that are opening up with ...
Our focus is computer science research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. ...
Bob Alverson, Scott Nettles, and Don Stark provided CAD tools instrumental in the design of the CPU. Jeremy Dion developed the functional simulator of the CPU. ...
doi:10.1109/jssc.1989.572612
fatcat:ap73okvntrb6hg63afkvplylqi
Circuit Implementation of a 300-MHz 64-bit Second-generation CMOS Alpha CPU
1995
Digital technical journal of Digital Equipment Corporation
A 300-MHz, custom 64-bit VLSI, second-generation Alpha CPU chip has been developed. The chip was designed in a 0.5-um CMOS technology using four levels of metal. ...
It contains an 8-KB instruction cache; an 8-KB data cache; and a 96-KB unified second-level cache. The chip can issue four instructions per cycle and delivers 1,200 mips/600 MFLOPS (peak). ...
ACKNOWLEDGMENTS We would like to acknowledge the contributions of many people who helped make this chip possible. ...
dblp:journals/dtj/BowhillBBBBCDEFGJKLLMMPSSST95
fatcat:sk6wveyqwrbd7kdy2a5twmdnve
Simultaneous multithreading support in embedded distributed memory MPSoCs
2013
Proceedings of the 50th Annual Design Automation Conference on - DAC '13
2.6.38 § 16kB private L1 data and instruction caches § 32bits channel width and DDR physical MJPEG video pipeline Results § Analysis on average bandwidth for MJPEG: § RMA bandwidth is the bottleneck ...
I$
D$
L1 cache CTRL
L2 cache CTRL L2 cache CTRL
Memory CTRL
(b)
Results
§ Analysis on performance scalability :
§ ARM SMP: 1 to 8 CPUs interconnected by a bus / NoC (GEM5)
§ OpenScale ...
doi:10.1145/2463209.2488836
dblp:conf/dac/GaribottiOBkASR13
fatcat:vvmgs57zpzgw5an5clmyhpdo3m
A 200-MHz 64-bit Dual-Issue CMOS Microprocessor
1992
Digital technical journal of Digital Equipment Corporation
The metal structure is designed to support the high operating frequency of the chip. Metal 3 is very thick and has a relatively large pitch. ...
The chip includes an 8-kilobyte (KB) Icache, 8KB D-cache and two associated translation buffers, a four-entry, 32-byte-per-entry write buffer, a pipelined 64-bit integer execution unit with a 32-entry ...
This initial set of instructions is used to write the bus control registers inside the CPU chip to set the cache timing and to test the chip and module from the CPU out. ...
dblp:journals/dtj/DobberpuhlWAABBCCDGHHKLLMMMMPRSS92
fatcat:7stblq65tzabfajo3o443uuygy
Architectural and organizational tradeoffs in the design of the MultiTitan CPU
1989
SIGARCH Computer Architecture News
We do work in the design, fabrication and packaging of hardware; language processing and scaling issues in system software design; and the exploration of new applications areas that are opening up with ...
Our focus is computer science research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. ...
If a separate FPU chip has its own set of registers, then loads and stores to the FPU chip can take place directly from the data cache, with the same latency as an FPU on the CPU chip. ...
doi:10.1145/74926.74957
fatcat:cib7kgqhyncyvaku3b6hsbdd6y
Architectural and organizational tradeoffs in the design of the MultiTitan CPU
1989
Proceedings of the 16th annual international symposium on Computer architecture - ISCA '89
We do work in the design, fabrication and packaging of hardware; language processing and scaling issues in system software design; and the exploration of new applications areas that are opening up with ...
Our focus is computer science research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. ...
If a separate FPU chip has its own set of registers, then loads and stores to the FPU chip can take place directly from the data cache, with the same latency as an FPU on the CPU chip. ...
doi:10.1145/74925.74957
dblp:conf/isca/Jouppi89
fatcat:3kobxt37tfdg7igbosjwt6pnxy
A 200-MHz 64-b dual-issue CMOS microprocessor
1992
IEEE Journal of Solid-State Circuits
A 400-MIPS/200-MFLOPS (peak) custom 64-b VLSI CPU chip is described. The chip is fabricated in a 0.75pm CMOS technology utilizing three levels of metalization and optimized for 3.3-V operation. ...
The chip includes separate 8-kilobyte instruction and data caches and a fully pipelined floating-point unit (FPU) that can handle both IEEE and VAX standard floating-point data types. ...
This initial set of instructions is used to write the bus control registers inside the CPU chip to set the cache timing and to test the chip and module from the CPU out. ...
doi:10.1109/4.165336
fatcat:zugbyrtgobdzvdiyotfuck5huq
SPARC64™ VIIIfx: Fujitsu's New Generation Octo Core Processor for PETA Scale computing
2010
IEEE Micro
The three CPU cores and an 8-Mbyte block of L3 cache are grouped as one last level cache and core unit (LCU). ...
With four LCUs, the SPARC64 XII chip includes a total of 12 cores and 32 Mbytes of L3 cache. ...
SPARC64 XII PROCESSOR STRUCTURE OUTSIDE THE CORE The three CPU cores and an 8-Mbyte block of L3 cache are grouped as one LCU, as shown in Figure 3 . ...
doi:10.1109/mm.2010.4
fatcat:pjrsck7cljhz7ont5anxuprxu4
Performance monitoring in advanced computer architecture
1998
Proceedings of the 1998 workshop on Computer architecture education - WCAE '98
The current generation of microprocessors has performance monitoring registers on chip which can be read by users. ...
The result is real-time monitoring of processor performance, and a new opportunity for computer architecture education. ...
Conclusion Performance monitors can be used in multiple ways; hardware design and analysis, software design and analysis, and as an educational tool. ...
doi:10.1145/1275182.1275199
dblp:conf/wcae/EnbodyPM98
fatcat:qucjzv7hrba6hhr5eudcbotr3a
Best-Effort FPGA Programming: A Few Steps Can Go a Long Way
[article]
2018
arXiv
pre-print
However, reading through recent publications on FPGA designs using HLS, one often gets the impression that FPGA programming is still hard in that it leaves programmers to explore a very large design space ...
Moreover, we show that the refinement steps in the best-effort guideline, consisting of explicit data caching, customized pipelining, processing element duplication, computation/communication overlapping ...
On one hand, the performances between the 64KB, 1MB and "infinite" groups are almost identical, which is consistent with our previous analysis on the design choice of caching size. ...
arXiv:1807.01340v1
fatcat:6ocpzvp2cvgkninbtyvvyk7yiu
« Previous
Showing results 1 — 15 out of 8,710 results