Filters








565 Hits in 4.1 sec

Front-end policies for improved issue efficiency in SMT processors

A. El-Moursy, D.H. Albonesi
The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings.  
In this paper, we propose new front-end policies that reduce the required integer and floating point issue queue sizes in SMT processors.  ...  For the same level of performance, the most effective policies reduce the issue queue occupancy by 33% for an SMT processor with appropriately-sized issue queue resources.  ...  Acknowledgements The authors wish to thank Dean Tullsen for the use of his simulator [19] and his help with our many questions, and the reviewers for their useful comments, especially as related to simulation  ... 
doi:10.1109/hpca.2003.1183522 dblp:conf/hpca/El-MoursyA03 fatcat:zug4cyoouffuxdnc4m5dgx3usa

Partitioning Multi-Threaded Processors with a Large Number of Threads

A. El-Moursy, R. Garg, D.H. Albonesi, S. Dwarkadas
2005 IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.  
IBM's Power5 processor is a 2-way Chip Multiprocessor (CMP) of SMT processors, each supporting 2 threads, which significantly reduces design complexity and can improve power efficiency.  ...  In a CMP organization, the gap between SMT and CMT processors shrinks further, making a CMP of CMT processors a highly viable alternative for the future.  ...  Partitioned Versus Shared Front-end In this section, we examine the effect of sharing the front-end in clustered multi-threaded processors given that the back-ends are privately assigned to the threads  ... 
doi:10.1109/ispass.2005.1430566 dblp:conf/ispass/El-MoursyGAD05 fatcat:4nwnbeei5jg7lpoo5j3664g2hq

Resource sharing control in Simultaneous MultiThreading microarchitectures

Chen Liu, Jean-Luc Gaudiot
2008 2008 13th Asia-Pacific Computer Systems Architecture Conference  
Simultaneous MultiThreading (SMT) achieves improved system resource utilization and accordingly higher instruction throughput because it exploits Thread-Level Parallelism (TLP) in addition to conventional  ...  In this work, we strive to quantitatively determine the balance between controlling resource allocation and dynamic sharing of different system resources with their impact on the performance of SMT processors  ...  [18] suggested several priority-based front-end policies for SMT microarchitectures that surpass the simple round-robin policy.  ... 
doi:10.1109/apcsac.2008.4625432 dblp:conf/aPcsac/LiuG08 fatcat:t77n5o4olvfeldybehszookmhu

Dynamic Inter-Thread Vectorization Architecture: Extracting DLP from TLP

Sajith Kalathingal, Sylvain Collange, Bharath N. Swamy, Andre Seznec
2016 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)  
DITVA extends an SIMD-enabled in-order SMT processor with an inter-thread vectorization execution mode.  ...  performance than a 4-thread 4issue SMT architecture with AVX instructions while fetching and issuing 51% fewer instructions, achieving an overall 24% energy reduction.  ...  We leverage this instruction redundancy to mutualize the front-end pipeline of an in-order SMT processor and create vector instructions dynamically, as a resource-efficient way to improve throughput on  ... 
doi:10.1109/sbac-pad.2016.11 dblp:conf/sbac-pad/KalathingalCSS16 fatcat:cw2do6mpd5hyno4lhi4ulewofm

DITVA: Dynamic Inter-Thread Vectorization Architecture

Sajith Kalathingal, Sylvain Collange, Bharath N. Swamy, André Seznec
2018 Journal of Parallel and Distributed Computing  
For instance, when the bandwidth is increased from 2GB/s to 16GB/s, we find that memory bound applications show an improvement in performance by 3× in comparison with the baseline SMT.  ...  By assembling dynamic vector instructions at runtime, DITVA extends an in-order SMT processor with a dynamic inter-thread vector execution mode akin to the Single-Instruction, Multiple-Thread model of  ...  We leverage this instruction redundancy to mutualize the front-end pipeline of an in-order SMT processor, as a resource-efficient way to improve throughput on SPMD applications.  ... 
doi:10.1016/j.jpdc.2017.11.006 fatcat:mp5mwj7kjrbiriczcidakmfq4m

Understanding the energy efficiency of SMT and CMP with multiclustering

Jason Cong, Ashok Jagannathan, Glenn Reinman, Yuval Tamir
2005 Proceedings of the 2005 international symposium on Low power electronics and design - ISLPED '05  
Specifically, we show that the energy efficiency of CMP compared to SMT at a given performance decreases from a maximum of 25% in a monolithic processor case to 6% when the processor resources are clustered  ...  In this paper we study the energy efficiency of SMT and CMP with multiclustering.  ...  We believe that these results should hold for more aggressive SMT models which also share these front-end resources among threads.  ... 
doi:10.1145/1077603.1077616 dblp:conf/islped/CongJRT05 fatcat:hplga7wbujda5f3n4o3klyslzi

Understanding the energy efficiency of SMT and CMP with multiclustering

J. Cong, A. Jagannathan, G. Reinman, Y. Tamir
2005 ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005.  
Specifically, we show that the energy efficiency of CMP compared to SMT at a given performance decreases from a maximum of 25% in a monolithic processor case to 6% when the processor resources are clustered  ...  In this paper we study the energy efficiency of SMT and CMP with multiclustering.  ...  We believe that these results should hold for more aggressive SMT models which also share these front-end resources among threads.  ... 
doi:10.1109/lpe.2005.195484 fatcat:phnp6yz2zvdlhjm2tkgosvbkvu

Addressing thermal nonuniformity in SMT workloads

Jonathan A. Winter, David H. Albonesi
2008 ACM Transactions on Architecture and Code Optimization (TACO)  
To address this, we propose and evaluate DTM mechanisms that exploit the steering-based thread management mechanisms inherent in a clustered SMT architecture.  ...  We show that in contrast to DVS, which operates globally, our techniques are more effective at controlling temperature for nonuniform workloads.  ...  With dispatch gating engaged, no further instructions are sent from the front end to the issue queues of the hot back end.  ... 
doi:10.1145/1369396.1369400 fatcat:wcg25cjdgresvnjugarliqqkwi

Per-thread cycle accounting in SMT processors

Stijn Eyerman, Lieven Eeckhout
2009 SIGARCH Computer Architecture News  
are running simultaneously on the SMT processor.  ...  This paper proposes a cycle accounting architecture for Simultaneous Multithreading (SMT) processors that estimates the execution times for each of the threads had they been executed alone, while they  ...  Stijn Eyerman and Lieven Eeckhout are Postdoctoral Fellows with the Fund for Scientific Research in Flanders (Belgium) (FWO-Vlaanderen).  ... 
doi:10.1145/2528521.1508260 fatcat:qir3y4cob5dp5crixlaebpr2vq

Per-thread cycle accounting in SMT processors

Stijn Eyerman, Lieven Eeckhout
2009 Proceeding of the 14th international conference on Architectural support for programming languages and operating systems - ASPLOS '09  
are running simultaneously on the SMT processor.  ...  This paper proposes a cycle accounting architecture for Simultaneous Multithreading (SMT) processors that estimates the execution times for each of the threads had they been executed alone, while they  ...  Stijn Eyerman and Lieven Eeckhout are Postdoctoral Fellows with the Fund for Scientific Research in Flanders (Belgium) (FWO-Vlaanderen).  ... 
doi:10.1145/1508244.1508260 dblp:conf/asplos/EyermanE09 fatcat:rl632osjnzgmxoxt6pmtditfs4

Per-thread cycle accounting in SMT processors

Stijn Eyerman, Lieven Eeckhout
2009 SIGPLAN notices  
are running simultaneously on the SMT processor.  ...  This paper proposes a cycle accounting architecture for Simultaneous Multithreading (SMT) processors that estimates the execution times for each of the threads had they been executed alone, while they  ...  Stijn Eyerman and Lieven Eeckhout are Postdoctoral Fellows with the Fund for Scientific Research in Flanders (Belgium) (FWO-Vlaanderen).  ... 
doi:10.1145/1508284.1508260 fatcat:jkzumnjppnehhfzspsilz6isf4

QoS for high-performance SMT processors in embedded systems

F.J. Cazorla, A. Ramirez, M. Valero, P.M.W. Knijnenburg, R. Sakellariou, E. Fernandez
2004 IEEE Micro  
front end to fetch from several threads, while the back end is shared.  ...  Processors for embedded systems typically are simple, with short pipelines and in-order execution.  ...  policy first fetches from threads with the fewest instructions in the processor's front end.  ... 
doi:10.1109/mm.2004.37 fatcat:pt23h6klhvd3lffgxnshrt7rse

Qsi Dynamical Fetch Policy For Smt

Shu-Chiao Yang, Jong-Jiann Shieh
2009 Zenodo  
SMT in fact was introduced as a powerful architecture to superscalar to increase the throughput of the processor.  ...  A Simultaneous Multithreading (SMT) Processor is capable of executing instructions from multiple threads in the same cycle.  ...  In fact, the fetch unit becomes one of the major bottlenecks of the SMT processor [3]. Issue logic is another candidate for bottleneck intuitively.  ... 
doi:10.5281/zenodo.1335111 fatcat:dj6fbqqkuvblrfebfrrrj433k4

DLL-conscious instruction fetch optimization for SMT processors

Fayez Mohamood, Mrinmoy Ghosh, Hsien-Hsin S. Lee
2008 Journal of systems architecture  
Simultaneous multithreading (SMT) processors can issue multiple instructions from distinct processes or threads in the same cycle.  ...  However, for an SMT processor with a virtually-indexed based cache implementation, existing instruction fetching mechanisms can induce unnecessary false cache misses caused by the DLL-based instructions  ...  In order to enable SMT, the existing front-end in TAXI was provided with additional state information to enable thread identification.  ... 
doi:10.1016/j.sysarc.2008.04.014 fatcat:mtfyd5noqbdypctqgdtp4ormti

Microarchitecture and Performance Analysis of Godson-2 SMT Processor

Zusong Li, Xianchao Xu, Weiwu Hu, Zhimin Tang
2006 Computer Design (ICCD '99), IEEE International Conference on  
The condition for implementing correct memory consistency model in Godson-2 SMT processor is studied and a new register-level sharing and synchronization scheme is proposed.  ...  This paper introduces the microarchitecture and logical implementation of SMT (Simultaneous Multithreading) improvement of Godson-2 processor which is a 64-bit, four-issue, out-of-order execution high  ...  The reason is that Xeon processor has on-chip secondary cache, and its front-end decode module is complex.  ... 
doi:10.1109/iccd.2006.4380860 dblp:conf/iccd/LiXHT06 fatcat:vd4hexqb6ja6ji6ukt5cr7tjvi
« Previous Showing results 1 — 15 out of 565 results