Filters








1,966 Hits in 8.2 sec

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

Jack L. Lo, Joel S. Emer, Henry M. Levy, Rebecca L. Stamm, Dean M. Tullsen, S. J. Eggers
1997 ACM Transactions on Computer Systems  
To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP).  ...  This article explores parallel processing on an alternative architecture, simultaneous multithreading (SMT), which allows multiple threads to compete for and share all of the processor's resources every  ...  ACKNOWLEDGMENTS We would like to thank John O'Donnell of Equator Technologies, Inc. and Tryggve Fossum of Digital Equipment Corp. for the source to the Alpha AXP version of the Multiflow compiler.  ... 
doi:10.1145/263326.263382 fatcat:urempgsyi5fmffbfxkr7s6zcju

An elementary processor architecture with simultaneous instruction issuing from multiple threads

Hiroaki Hirata, Kozo Kimura, Satoshi Nagamine, Yoshiyuki Mochizuki, Akio Nishimura, Yoshimori Nakase, Teiji Nishizawa
1992 Proceedings of the 19th annual international symposium on Computer architecture - ISCA '92  
In our processor architecture, instructions from different threads (not a single thread) are issued simultaneously to multiple functional units, and these instructions can begin execution unless there  ...  Another loop execution scheme, by using the multiple control flow mechanism of our architecture, makes it possible to parallelize loops which are difficult to parallelize in vector or VLIW machines.  ...  On the other hand, parallel multithreading within a processor is a latency-hiding technique at the instruction level.  ... 
doi:10.1145/139669.139710 dblp:conf/isca/HirataKNMNNN92 fatcat:2mtvetydrjberag77rtikjl2xa

An elementary processor architecture with simultaneous instruction issuing from multiple threads

Hiroaki Hirata, Kozo Kimura, Satoshi Nagamine, Yoshiyuki Mochizuki, Akio Nishimura, Yoshimori Nakase, Teiji Nishizawa
1992 SIGARCH Computer Architecture News  
In our processor architecture, instructions from different threads (not a single thread) are issued simultaneously to multiple functional units, and these instructions can begin execution unless there  ...  Another loop execution scheme, by using the multiple control flow mechanism of our architecture, makes it possible to parallelize loops which are difficult to parallelize in vector or VLIW machines.  ...  On the other hand, parallel multithreading within a processor is a latency-hiding technique at the instruction level.  ... 
doi:10.1145/146628.139710 fatcat:xklw4rswkbczjk63mmwkynn5vi

Integrating multiple forms of multithreaded execution on multi-SMT systems: a study with scientific applications

M. Curtis-Maury, Tanping Wang, C. Antonopoulos, D. Nikolopoulos
2005 Second International Conference on the Quantitative Evaluation of Systems (QEST'05)  
Most scientific applications have high degrees of parallelism and thread-level parallel execution appears to be a natural choice for executing these applications on systems composed of SMT processors.  ...  Our study shows that combining adaptive throttling and speculative precomputation with regular thread-level parallelization leads to significant performance improvements in parallel codes which suffer  ...  The authors would like to thank Xavier Martorell of DAC-UPC in Barcelona and IBM Research, for recommending the nested OpenMP execution model for merging speculative precomputation with thread-level parallel  ... 
doi:10.1109/qest.2005.16 dblp:conf/qest/Curtis-MauryW05 fatcat:tgr5h2c7xrbd7f6txgmzygusui

Multiple Instruction Stream Processor

Richard A. Hankins, Gautham N. Chinya, Jamison D. Collins, Perry H. Wang, Ryan Rakvic, Hong Wang, John P. Shen
2006 SIGARCH Computer Architecture News  
MISP allows an application program to directly manage user-level threads without OS intervention.  ...  MISP introduces the sequencer as a new category of architectural resource, and defines a canonical set of instructions to support user-level inter-sequencer signaling and asynchronous control transfer.  ...  are still restricted to a single instruction stream and cannot directly exploit thread level parallelism.  ... 
doi:10.1145/1150019.1136495 fatcat:ibcv5a3h5fhnzjpvcw7jlk5xli

Responsive Multithreaded Processor for Distributed Real-Time Systems

Nobuyuki Yamasaki
2005 Journal of Robotics and Mechatronics  
The RMT Processing Unit (RMT PU) executes eight prioritized threads simultaneously using fine-grained multithreading based on priority, called the RMT architecture.  ...  System designers use on-chip functions easily by connecting required I/Os to this chip and the designers realize distributed control by connecting several RMT Processors with their own functions via Responsive  ...  This study was also contributed to by the fund of the CREST, JST.  ... 
doi:10.20965/jrm.2005.p0130 fatcat:zqbf2hnxrnchnafvikfod5vnsy

Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

Allan Snavely, Dean M. Tullsen, Geoff Voelker
2002 Performance Evaluation Review  
Simultaneous Multithreading machines benefit from jobscheduling software that monitors how well coscheduled jobs share CPU resources, and coschedules jobs that interact well to make more efficient use  ...  This paper demonstrates that a scheduler for an SMT machine can both satisfy process priorities and symbiotically schedule low and high priority threads to increase system throughput.  ...  This is because thread level parallelism (TLP) is converted into instruction level parallelism (ILP).  ... 
doi:10.1145/511399.511343 fatcat:no34wmfvcnhnbclxzkvxgt7x2a

Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

Allan Snavely, Dean M. Tullsen, Geoff Voelker
2002 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems - SIGMETRICS '02  
Simultaneous Multithreading machines benefit from jobscheduling software that monitors how well coscheduled jobs share CPU resources, and coschedules jobs that interact well to make more efficient use  ...  This paper demonstrates that a scheduler for an SMT machine can both satisfy process priorities and symbiotically schedule low and high priority threads to increase system throughput.  ...  This is because thread level parallelism (TLP) is converted into instruction level parallelism (ILP).  ... 
doi:10.1145/511339.511343 fatcat:s4kduxzdz5f3pj3twqa3z5fwba

Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

Allan Snavely, Dean M. Tullsen, Geoff Voelker
2002 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems - SIGMETRICS '02  
Simultaneous Multithreading machines benefit from jobscheduling software that monitors how well coscheduled jobs share CPU resources, and coschedules jobs that interact well to make more efficient use  ...  This paper demonstrates that a scheduler for an SMT machine can both satisfy process priorities and symbiotically schedule low and high priority threads to increase system throughput.  ...  This is because thread level parallelism (TLP) is converted into instruction level parallelism (ILP).  ... 
doi:10.1145/511334.511343 dblp:conf/sigmetrics/SnavelyTV02 fatcat:w43kfzdrijg7fnnqm5v2j4fvty

Exploring the performance limits of simultaneous multithreading for memory intensive applications

Evangelia Athanasaki, Nikos Anastopoulos, Kornilios Kourtis, Nectarios Koziris
2007 Journal of Supercomputing  
Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor.  ...  However, the speedup of a single application that is parallelized into multiple threads, is often sensitive to its inherent instruction level parallelism (ILP), as well as the efficiency of synchronization  ...  In this way, thread level parallelism is effectively converted into instruction level parallelism.  ... 
doi:10.1007/s11227-007-0149-x fatcat:x732lxzwkbfvtfmnirmsrm3k7i

Parallelization of a dynamic unstructured application using three leading paradigms

Leonid Oliker, Rupak Biswas
1999 Proceedings of the 1999 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '99  
Our overall results demonstrate that multithreaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.  ...  version on the newly-released Tera Multithreaded Architecture (MTA).  ...  If a thread is waiting for its memory reference to complete, the processor executes instructions from other threads.  ... 
doi:10.1145/331532.331571 dblp:conf/sc/OlikerB99 fatcat:3fbdskwh3bb3dnpwm73zag4xlm

Scalable Parallel Distributed Coprocessor System for Graph Searching Problems with Massive Data

Wanrong Huang, Xiaodong Yi, Yichun Sun, Yingwen Liu, Shuai Ye, Hengzhu Liu
2017 Scientific Programming  
Within the limits permitted by the Graph500 test bench after 1D parallel hybrid BFS algorithm testing, our 8-core and 8-thread-per-core system achieved superior performance and efficiency compared with  ...  We design a binary search algorithm to address mapping to unify all processor addresses.  ...  An instruction register is designed for each thread to store the last instruction. When the thread is blocked, it could take the last instruction from the instruction register.  ... 
doi:10.1155/2017/1496104 fatcat:zdtoc6ythndvrkibjkrawq54nq

Symbiotic jobscheduling for a simultaneous mutlithreading processor

Allan Snavely, Dean M. Tullsen
2000 SIGPLAN notices  
Simultaneous Multithreading machines fetch and execute instructions from multiple instruction streams to increase system utilization and speedup the execution of jobs.  ...  on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler.  ...  ACKNOWLEDGEMENTS We w ould like to thank the anonymous reviewers for their useful comments.  ... 
doi:10.1145/356989.357011 fatcat:gikffljryfegjikqudar4v4usa

Architecture of Embedded Microprocessors [chapter]

Eric Rotenberg, Aravindh Anantaraman
2005 Multiprocessor Systems-on-Chips  
Data-level parallelism and regular control flow in streaming applications has led engineers to base many DSPs on the very long instruction word (VLIW) execution model [149], which encodes regular parallelism  ...  A personal computer is expected to run arbitrary software ( Fig. 4-1) : productivity tools (e-mail, word processors, spreadsheets, presentations, and so on), computer-aided design (CAD), games, multimedia  ...  Hardware multithreading improves pipeline utilization by converting thread-level parallelism to instruction-level parallelism [172] [173] [174] .  ... 
doi:10.1016/b978-012385251-9/50018-9 fatcat:smv3zphpnjfvrh5mslzjvgz4fa

FlexGrip: A soft GPGPU for FPGAs

Kevin Andryc, Murtaza Merchant, Russell Tessier
2013 2013 International Conference on Field-Programmable Technology (FPT)  
However, it is difficult to straightforwardly extend their functionality to support conditional and threadbased execution characteristic of general-purpose graphics processing units (GPGPUs) without recompiling  ...  This architecture supports direct CUDA compilation to a binary which is executable on the FPGAbased GPGPU without hardware recompilation.  ...  Today, GPUs are widely used to evaluate highly multithreaded data parallel applications expressed in high-level languages such as CUDA and OpenCL.  ... 
doi:10.1109/fpt.2013.6718358 dblp:conf/fpt/AndrycMT13 fatcat:7ey67anaezbj7p7dgz2qtlnzty
« Previous Showing results 1 — 15 out of 1,966 results