Filters








2,025 Hits in 5.7 sec

A case for a complexity-effective, width-partitioned microarchitecture

Olivier Rochecouste, Gilles Pokam, André Seznec
2006 ACM Transactions on Architecture and Code Optimization (TACO)  
t ¡ £ l w s m f e z e ¤ R width ¤ V n } 3 s x H f F l w ¡ £ ¦ !  ...  d p { p z x u Í A f e ¢ Á p V n g ª 8 ¡ ¢ z x f © § Nregs × R width × ω 2 × (N read + N write ) × (N read + 2 × N write ) cell size !  ... 
doi:10.1145/1162690.1162693 fatcat:rdsvmelxbrew5mkby4r2ggdheq

Achieving Out-of-Order Performance with Almost In-Order Complexity

Francis Tseng, Yale N. Patt
2008 2008 International Symposium on Computer Architecture  
The result from processing braids is performance within 9% of a very aggressive conventional out-of-order microarchitecture with almost the complexity of an in-order implementation.  ...  However, traditional methods of increasing issue width do not scale; that is, they drastically increase design complexity and power requirements.  ...  This paper uses a combined compiler and microarchitecture approach to enable wider issue widths while reducing design complexity.  ... 
doi:10.1109/isca.2008.23 dblp:conf/isca/TsengP08 fatcat:ic5swgwygjgavgvqqusvpurq5q

Revisiting Clustered Microarchitecture for Future Superscalar Cores

Pierre Michaud, Andrea Mondelli, André Seznec
2015 ACM Transactions on Architecture and Code Optimization (TACO)  
For example, the Intel Nehalem microarchitecture can issue 6 micro-ops per cycle from a 36-entry issue buffer, while the more recent Intel Haswell microarchitecture can issue 8 micro-ops per cycle from  ...  We also propose a method for decreasing the energy cost of sending results from one cluster to the other cluster. (ILP).  ...  Revisiting clustered microarchitecture for future superscalar cores: a case for wide-issue clusters 1.  ... 
doi:10.1145/2800787 fatcat:hdvi3td4azecxdjyho2asjoqcy

Achieving Out-of-Order Performance with Almost In-Order Complexity

Francis Tseng, Yale N. Patt
2008 SIGARCH Computer Architecture News  
The result from processing braids is performance within 9% of a very aggressive conventional out-of-order microarchitecture with almost the complexity of an in-order implementation.  ...  However, traditional methods of increasing issue width do not scale; that is, they drastically increase design complexity and power requirements.  ...  This paper uses a combined compiler and microarchitecture approach to enable wider issue widths while reducing design complexity.  ... 
doi:10.1145/1394608.1382169 fatcat:fhp5wiblojczbmsukhu7p5g57u

Thermal Herding: Microarchitecture Techniques for Controlling Hotspots in High-Performance 3D-Integrated Processors

Kiran Puttaswamy, Gabriel H. Loh
2007 2007 IEEE 13th International Symposium on High Performance Computer Architecture  
Our 3D/thermal-aware microarchitecture contributions include a significance-partitioned datapath that places the frequently switching 16-bits on the top die, a 3D-aware instruction scheduler allocation  ...  scheme, an address memoization approach for the load and store queues, a partial value encoding for the L1 data cache, and a branch target buffer that exploits a form of frequent partial value locality  ...  Acknowledgments Funding and equipment for this project have been provided by Intel Corporation and a grant from the Microelectronics Advanced Research Corporation (MARCO).  ... 
doi:10.1109/hpca.2007.346197 dblp:conf/hpca/PuttaswamyL07 fatcat:rztjintxqfaebbw3q6zy5strku

MASCOT: Microarchitecture synthesis of control paths

AJWM ten Berg
1994 Microprocessors and microsystems  
It transforms the initial microarchitecture into a complex microarchitecture of several PLAs and ROMs.  ...  Our strategy integrates a number of known optimization methods for specific microarchitectures.  ...  One can see that for the larger machines more complex microarchitectures are found.  ... 
doi:10.1016/0141-9331(94)90091-4 fatcat:ef2hhmulyzf5bfwbs2vmi27ria

Low-cost router microarchitecture for on-chip networks

John Kim
2009 Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture - Micro-42  
By removing the complexity of a baseline router microarchitecture, the low-cost router microarchitecture can also approach the ideal latency in on-chip networks.  ...  As a result, the on-chip network architecture will not scale properly because of design complexity.  ...  Acknowledgments We would like to thank the anonymous reviewers for their comments. This work was supported in part by the KAIST-Microsoft Research Collaboration Center (KMCC) at KAIST, Korea.  ... 
doi:10.1145/1669112.1669145 dblp:conf/micro/Kim09 fatcat:z7kdn7xwo5hs5gphversqnsisq

Clock rate versus IPC

Vikas Agarwal, M. S. Hrishikesh, Stephen W. Keckler, Doug Burger
2000 SIGARCH Computer Architecture News  
In this paper, we describe technology-driven models for wire capacitance, wire delay, and microarchitectural component delay.  ...  Using the results of these models, we measure the simulated performance-estimating both clock rate and IPCof an aggressive out-of-order microarchitecture as it is scaled from a 250nm technology to a 35nm  ...  First, since transistors are smaller, more can be placed on a single die, providing area for more complex microarchitectures.  ... 
doi:10.1145/342001.339691 fatcat:w56gmfvpsnbkvpkkthhy2h64km

Clock rate versus IPC

Vikas Agarwal, M. S. Hrishikesh, Stephen W. Keckler, Doug Burger
2000 Proceedings of the 27th annual international symposium on Computer architecture - ISCA '00  
In this paper, we describe technology-driven models for wire capacitance, wire delay, and microarchitectural component delay.  ...  Using the results of these models, we measure the simulated performance-estimating both clock rate and IPCof an aggressive out-of-order microarchitecture as it is scaled from a 250nm technology to a 35nm  ...  First, since transistors are smaller, more can be placed on a single die, providing area for more complex microarchitectures.  ... 
doi:10.1145/339647.339691 fatcat:d2tvtxr6zba7xfsxldilcbyzna

A hybrid fixed-function and microprocessor solution for high-throughput broad-phase collision detection

Muiris Woulfe, Michael Manzke
2016 EURASIP Journal on Embedded Systems  
The two microarchitectures are combined with the remainder of the system through an original method for sharing data between a ray tracer and the collision-detection microarchitectures to minimise data  ...  These benchmarks reveal that, for over one million objects, our design achieves an acceleration of 812× relative to a CPU and an acceleration of 161× relative to a GPU.  ...  Doyle for providing guidance on the ray-tracing elements of this article.  ... 
doi:10.1186/s13639-016-0037-7 fatcat:g55254vfendxjld7i2uwap4yhm

Superspeculative microarchitecture for beyond AD 2000

M.H. Lipasti, J.P. Shen
1997 Computer  
The experimental, superspeculative microarchitecture Superflow has a potential performance of 9.0 instructions per cycle and realizable performance of 7.3 IPC for the SPEC95 integer suite, without requiring  ...  Researchers have proposed reconfigurable computers that employ large arrays of highly programmable build-Employing a broad spectrum of superspeculative techniques can achieve significant performance increases  ...  For adequate load instruction throughput, we introduce load stream partitioning, a divide-and-conquer strategy for reducing the cost and complexity of a high-bandwidth memory system.  ... 
doi:10.1109/2.612250 fatcat:ezukhsogtvcnjfga5hqpj5jcsi

Reducing wire delay penalty through value prediction

Joan-Manuel Parcerisa, Antonio González
2000 Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture - MICRO 33  
We show that value prediction reduces the penalties caused by inter-cluster communication by 18% on average for a realistic implementation of a 4-cluster microarchitecture.  ...  Only in the case of misprediction, the long wire delay is experienced. We apply this concept to a clustered microarchitecture in order to reduce inter-cluster communication.  ...  Acknowledgements We thank the anonymous referees for their valuable comments.  ... 
doi:10.1145/360128.360163 fatcat:2jbadcdudbcjfii5ck2ue4ky4u

Hardware Accelerated Broad Phase Collision Detection for Realtime Simulations [article]

Muiris Woulfe, John Dingliana, Michael Manzke
2007 Workshop on Virtual Reality Interactions and Physical Simulations  
To over- come this hurdle, we propose a novel microarchitecture for performing broad phase collision detection using Axis-Aligned Bounding Boxes (AABBs), which exploits the parallelism available in the  ...  We have im- plemented our microarchitecture on a Field-Programmable Gate Array (FPGA) and our results show that this im- plementation is capable of achieving an acceleration of up to 1.5× over the broad  ...  Acknowledgements This research is supported by the Irish Research Council for Science, Engineering and Technology (IRCSET) funded by the National Development Plan (NDP).  ... 
doi:10.2312/pe/vriphys/vriphys07/079-088 dblp:conf/vriphys/WoulfeDM07 fatcat:bnuymbpq65exjj57d7mgjciai4

Automated accelerator generation and optimization with composable, parallel and pipeline architecture

Jason Cong, Peng Wei, Cody Hao Yu, Peng Zhang
2018 Proceedings of the 55th Annual Design Automation Conference on - DAC '18  
Such a well-defined template is able to support efficient accelerator designs for a broad class of computation kernels, and more importantly, drastically reduce the design space.  ...  Specifically, we first propose the composable, parallel and pipeline (CPP) microarchitecture as a template of accelerator designs.  ...  Featuring the CPP microarchitecture, a fast analytical model-based design space exploration and automatic code transformation, AutoAccel achieves 72x speed-up and 260.4× energy improvement for a broad  ... 
doi:10.1145/3195970.3195999 dblp:conf/dac/CongWYZ18 fatcat:ezisbhayq5hlxfoko437bljdne

A multi-level approach to reduce the impact of NBTI on processor functional units

Taniya Siddiqua, Sudhanva Gurumurthi
2010 Proceedings of the 20th symposium on Great lakes symposium on VLSI - GLSVLSI '10  
In this paper, we propose a multi-level optimization approach, combining techniques at the circuit and microarchitecture levels, for reducing the impact of NBTI on the functional units (FUs) of a highperformance  ...  We then propose a set of NBTI-aware dynamic instruction scheduling policies at the microarchitecture level and quantify their impact on application performance and guardband reduction through executiondriven  ...  For example, in the case of Figure 1 , the input to the MUXes will determine whether the FU will be used as a 64-bit FU or a 32-bit FU. Each segment is connected to ground via a footer device.  ... 
doi:10.1145/1785481.1785498 dblp:conf/glvlsi/SiddiquaG10 fatcat:kyituy3645eqbnldovwtj766yi
« Previous Showing results 1 — 15 out of 2,025 results