Filters








2,648 Hits in 5.7 sec

Multi-Threaded Processors [chapter]

David Padua, Amol Ghoting, John A. Gunnels, Mark S. Squillante, José Meseguer, James H. Cownie, Duncan Roweth, Sarita V. Adve, Hans J. Boehm, Sally A. McKee, Robert W. Wisniewski, George Karypis (+29 others)
2011 Encyclopedia of Parallel Computing  
Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  The instruction-level parallelism found in a conventional instruction stream is limited. Studies have shown the limits of processor utilization even for today's superscalar microprocessors.  ...  for multiple contexts.  ... 
doi:10.1007/978-0-387-09766-4_423 fatcat:heb3n2cfwnbi5nvxv5kvxd2xgm

Thread Cluster Memory Scheduling

Yoongu Kim, Michael Papamichael, Onur Mutlu, Mor Harchol-Balter
2011 IEEE Micro  
Yoongu Kim is supported by a PhD fellowship from the Korea Foundation for Advanced Studies. We gratefully acknowledge the support of the Gigascale Systems Research Center, Intel, and CyLab.  ...  This research was partially supported by a National Science Foundation Career Award (CCF-0953246).  ...  Lee et al. describe a mechanism to adaptively prioritize between prefetch and demand requests in a memory scheduler; 17 their mechanism can be combined with ours. buffer called the row buffer.  ... 
doi:10.1109/mm.2011.15 fatcat:wfphgdbeprg25nwcphjipx7z2q

Global Multi-Threaded Instruction Scheduling

Guilherme Ottoni, David August
2007 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007)  
Using a fully automatic compiler implementation of GREMIO and a validated processor model, this paper demonstrates gains for a dual-core CMP model running a variety of codes.  ...  In this paper, we first propose a framework that enables global multi-threaded instruction scheduling in general. We then describe GREMIO, a scheduler built using this framework.  ...  Acknowledgments We thank the entire Liberty Research Group and Vivek Sarkar for their feedback during this work. Additionally, we thank the anonymous reviewers for their insightful comments.  ... 
doi:10.1109/micro.2007.32 dblp:conf/micro/OttoniA07 fatcat:lkiem6welja2noa6d3vvwwpohu

Thread-Sensitive Instruction Issue for SMT Processors

B. Robatmili, N. Yazdani, S. Sardashti, M. Nourani
2004 IEEE computer architecture letters  
In this paper, we propose a thread sensitive issue policy for a partitioned SMT processor which is based on a thread metric.  ...  Simultaneous Multi Threading (SMT) is a processor design method in which concurrent hardware threads share processor resources like functional units and memory.  ...  INTRODUCTION Simultaneous multithreading (SMT) is a processor design approach which permits multiple hardware threads to share functional units simultaneously in each cycle [6] [7] .  ... 
doi:10.1109/l-ca.2004.9 fatcat:r3g3x3yjirgwlc56zfvaft5kea

Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior

Yoongu Kim, Michael Papamichael, Onur Mutlu, Mor Harchol-Balter
2010 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture  
In a modern chip-multiprocessor system, memory is a shared resource among multiple concurrently executing threads.  ...  TCM introduces three major ideas for prioritization: 1) we prioritize the latency-sensitive cluster over the bandwidth-sensitive cluster to improve system throughput; 2) we introduce a "niceness" metric  ...  Yoongu Kim is supported by a Ph.D. fellowship from KFAS (Korea Foundation for Advanced Studies). We gratefully acknowledge the support of Gigascale Systems Research Center, AMD, Intel, and CyLab.  ... 
doi:10.1109/micro.2010.51 dblp:conf/micro/KimPMH10 fatcat:3ju5j4capjewpj3hcwuy2lvsgm

DITVA: Dynamic Inter-Thread Vectorization Architecture

Sajith Kalathingal, Sylvain Collange, Bharath N. Swamy, André Seznec
2018 Journal of Parallel and Distributed Computing  
To balance thread-and data-level parallelism, threads are statically grouped into fixed-size independently scheduled warps.  ...  By assembling dynamic vector instructions at runtime, DITVA extends an in-order SMT processor with a dynamic inter-thread vector execution mode akin to the Single-Instruction, Multiple-Thread model of  ...  DITVA uses a stack-less implicit mechanism, which prioritizes threads, to maximize convergent execution of an SPMD application.  ... 
doi:10.1016/j.jpdc.2017.11.006 fatcat:mp5mwj7kjrbiriczcidakmfq4m

Partitioning Multi-Threaded Processors with a Large Number of Threads

A. El-Moursy, R. Garg, D.H. Albonesi, S. Dwarkadas
2005 IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005.  
This paper examines processor partitioning options for larger numbers of threads on a chip.  ...  In a CMP organization, the gap between SMT and CMT processors shrinks further, making a CMP of CMT processors a highly viable alternative for the future.  ...  Each processor in this context runs the same number of threads, and threads are randomly assigned to processors.  ... 
doi:10.1109/ispass.2005.1430566 dblp:conf/ispass/El-MoursyGAD05 fatcat:4nwnbeei5jg7lpoo5j3664g2hq

Extending database task schedulers for multi-threaded application code

Florian Wolf, Iraklis Psaroudakis, Norman May, Anastasia Ailamaki, Kai-Uwe Sattler
2015 Proceedings of the 27th International Conference on Scientific and Statistical Database Management - SSDBM '15  
Multi-threaded application code, however, introduces a resource competition between the threads of applications and the threads of the database task scheduler.  ...  We present a general approach to address this issue by integrating shared memory programming solutions into the task schedulers of databases.  ...  For our experiments, we use a HP Z620 workstation with two six-core Intel Xeon E5-2643V2 processors at 3.50 GHz with Hyper Threading enabled (for a total of 24 hardware contexts).  ... 
doi:10.1145/2791347.2791379 dblp:conf/ssdbm/WolfPMAS15 fatcat:skw2oyo4jjbmxpnfqjdhpi32hq

Multi-threading and one-sided communication in parallel LU factorization

Parry Husbands, Katherine Yelick
2007 Proceedings of the 2007 ACM/IEEE conference on Supercomputing - SC '07  
Nevertheless, the standard algorithm for this problem has non-trivial dependence patterns which limit parallelism, and local computations require large matrices in order to achieve good single processor  ...  Dense LU factorization has a high ratio of computation to communication and, as evidenced by the High Performance Linpack (HPL) benchmark, this property makes it scale well on most parallel machines.  ...  hierarchies such as Cell), in the hopes of discovering a concise yet complete set of primitives that are useful for building high performance parallel applications.  ... 
doi:10.1145/1362622.1362664 dblp:conf/sc/HusbandsY07 fatcat:atejwvvfa5bv3ozdazpe77yt54

Virtual-Threading: Advanced General Purpose Processors Architecture [article]

Andrei I. Yafimau
2009 arXiv   pre-print
It is shown that a well suited for GPPC implementation architecture should have high level of GLT and is described such architecture, which is called the Virtual-Threaded Machine.  ...  This architecture is intended to effective support of the General Purpose Parallel Computing (GPPC), the essence of which is extremely frequent switching of threads between states of activity and states  ...  The hardware prioritized multiprogramming execution of a virtual set of threads in contexts of a virtual set of processes. 6 .  ... 
arXiv:0910.4052v1 fatcat:jvjjjf5h55g5xaexkc2cqbtdfu

Characterizing thread placement in the IBM POWER7 processor

Stelios Manousopoulos, Miquel Moreto, Roberto Gioiosa, Nectarios Koziris, Francisco J. Cazorla
2012 2012 IEEE International Symposium on Workload Characterization (IISWC)  
In those processors, the way threads are assigned to different hardware contexts, denoted thread placement, plays a key role in improving overall performance.  ...  There is a clear trend in current processor design towards the combination of several thread level parallelism paradigms on the same chip, exemplified by processors such as the IBM POWER7.  ...  On the software level, the OS can initially schedule threads on the CMP level, while using the extra SMT contexts for higher thread numbers.  ... 
doi:10.1109/iiswc.2012.6402916 dblp:conf/iiswc/ManousopoulosMGKC12 fatcat:xlirdbccpvh7favedi3sccxgwe

Asynchronous programs with prioritized task-buffers

Michael Emmi, Akash Lal, Shaz Qadeer
2012 Proceedings of the ACM SIGSOFT 20th International Symposium on the Foundations of Software Engineering - FSE '12  
Guided by real-world applications of asynchronous programming, we propose a new model that enriches the asynchronous programming model by adding two new features: multiple task-buffers and multiple task-priority  ...  For example, pushdown systems are a natural (and popular) model for sequential recursive programs that isolate the call-return semantics of procedure calls.  ...  Whether executing on a single or across multiple processors, a collection of software threads-each essentially behaving as recursive sequential programs-execute concurrently, interleaving their read and  ... 
doi:10.1145/2393596.2393652 dblp:conf/sigsoft/EmmiLQ12 fatcat:4zyel4flzvc3jkqvbfniynbkiu

Per-thread cycle accounting in multicore processors

Kristof Du Bois, Stijn Eyerman, Lieven Eeckhout
2013 ACM Transactions on Architecture and Code Optimization (TACO)  
Unpredictable per-thread performance becomes a problem when considered in the context of multicore scheduling: system software assumes that all threads make equal progress, however, this is not what the  ...  This article proposes a hardware-efficient per-thread cycle accounting architecture for multicore processors.  ...  ACKNOWLEDGMENTS We thank the anonymous reviewers for their constructive and insightful feedback.  ... 
doi:10.1145/2400682.2400688 fatcat:z5nggd3j4jd7lan6pktxc7xp7i

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

Jack L. Lo, Joel S. Emer, Henry M. Levy, Rebecca L. Stamm, Dean M. Tullsen, S. J. Eggers
1997 ACM Transactions on Computer Systems  
Wide-issue superscalar processors exploit ILP by executing multiple instructions from a single program in a single cycle.  ...  The most compelling reason for running parallel applications on an SMT processor is its ability to use thread-level parallelism and instruction-level parallelism interchangeably. By permitting  ...  ACKNOWLEDGMENTS We would like to thank John O'Donnell of Equator Technologies, Inc. and Tryggve Fossum of Digital Equipment Corp. for the source to the Alpha AXP version of the Multiflow compiler.  ... 
doi:10.1145/263326.263382 fatcat:urempgsyi5fmffbfxkr7s6zcju

Dynamic Inter-Thread Vectorization Architecture: Extracting DLP from TLP

Sajith Kalathingal, Sylvain Collange, Bharath N. Swamy, Andre Seznec
2016 2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)  
To balance thread-and data-level parallelism, threads are statically grouped into fixed-size independently scheduled warps.  ...  DITVA extends an SIMD-enabled in-order SMT processor with an inter-thread vectorization execution mode.  ...  In particular, when all threads within a warp are divergent, the MinSP-PC thread will be scheduled twice every m + 1 scheduling cycles for the warp, while each other thread will be scheduled once every  ... 
doi:10.1109/sbac-pad.2016.11 dblp:conf/sbac-pad/KalathingalCSS16 fatcat:cw2do6mpd5hyno4lhi4ulewofm
« Previous Showing results 1 — 15 out of 2,648 results