Filters








119 Hits in 12.2 sec

Multiplex

Chong-Liang Ooi, Seon Wook Kim, Il Park, Rudolf Eigenmann, Babak Falsafi, T. N. Vijaykumar
2001 Proceedings of the 15th international conference on Supercomputing - ICS '01  
sequential execution stream and speculatively executes them in parallel on multiple processor cores.  ...  Multiplex exploits the similarities between implicit and explicit threading, and provides a unified support for the two threading models without additional hardware.  ...  MULTIPLEX: UNIFYING EXPLICIT/ IMPLICIT THREADING In this paper, we propose Multiplex, an architecture that unifies explicit and implicit threading on a chip multiprocessor.  ... 
doi:10.1145/377792.377863 dblp:conf/ics/OoiKPEFV01 fatcat:fuke25apgnbn7eqhxod62zsz3m

Multi-Threaded Processors [chapter]

David Padua, Amol Ghoting, John A. Gunnels, Mark S. Squillante, José Meseguer, James H. Cownie, Duncan Roweth, Sarita V. Adve, Hans J. Boehm, Sally A. McKee, Robert W. Wisniewski, George Karypis (+29 others)
2011 Encyclopedia of Parallel Computing  
The chip multiprocessor integrates two or more complete processors on a single chip. Every unit of a processor is duplicated and used independently of its copies on the chip.  ...  The instruction-level parallelism found in a conventional instruction stream is limited. Studies have shown the limits of processor utilization even for today's superscalar microprocessors.  ...  Here thread-level parallelism is utilized, typically in combination with thread-level speculation [15] .  ... 
doi:10.1007/978-0-387-09766-4_423 fatcat:heb3n2cfwnbi5nvxv5kvxd2xgm

Multithreaded Processors

T. Ungerer
2002 Computer journal  
The chip multiprocessor integrates two or more complete processors on a single chip. Every unit of a processor is duplicated and used independently of its copies on the chip.  ...  The instruction-level parallelism found in a conventional instruction stream is limited. Studies have shown the limits of processor utilization even for today's superscalar microprocessors.  ...  Here thread-level parallelism is utilized, typically in combination with thread-level speculation [15] .  ... 
doi:10.1093/comjnl/45.3.320 fatcat:hlkkabuhrzhkrmuyqomzfmc6zm

A survey of processors with explicit multithreading

Theo Ungerer, Borut Robič, Jurij Šilc
2003 ACM Computing Surveys  
The contexts of two or more threads of control are often stored in separate on-chip register sets.  ...  Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple  ...  speculative multithreading), and chip multiprocessors.  ... 
doi:10.1145/641865.641867 fatcat:u6x7jdmkfvexnm3culskjsoxwi

A single-chip multiprocessor

B.A. Nayfeh, K. Olukotun
1997 Computer  
Additionally, having all eight of the CPUs on a single chip allows designers to exploit thread-level parallelism even when threads communicate frequently.  ...  The processor core dynamically allocates instruction fetch and execution resources among the different threads on a cycle-by-cycle basis to find as much thread-level and instruction-level parallelism as  ... 
doi:10.1109/2.612253 fatcat:l645n6krxnaphalnk5w6pogwye

Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency

Kunle Olukotun, Lance Hammond, James Laudon
2007 Synthesis Lectures on Computer Architecture  
if on a single chip SMP Realistic 4 25 20 multiprocessor, if on a board CHIP MULTIPROCESSOR ARCHITECTURE CHIP MULTIPROCESSOR ARCHITECTURE FIGURE 4.8: Overall speedup obtained in different  ...  Unlike conventional uniprocessors, multicore chips can use TLP, and can therefore also take advantage of threads to utilize parallelism from the traditional large-grain task and process level parallelism  ...  Olukotun led the Stanford Hydra project which developed the first chip multiprocessor (multicore chip) with support for thread-level speculation.  ... 
doi:10.2200/s00093ed1v01y200707cac003 fatcat:qyjilavdhfcmlnc46l5sxg7ssq

A Survey on Hardware and Software Support for Thread Level Parallelism [article]

Somnath Mazumdar, Roberto Giorgi
2016 arXiv   pre-print
Todays computers are built upon multiple processing cores and run applications consisting of a large number of threads, making runtime thread management a complex process.  ...  To support growing massive parallelism, functional components and also the capabilities of current processors are changing and continue to do so.  ...  The aim of Cilk is to help programmers to build applications optimized for a maximum level of parallelism on shared-memory multiprocessors (SMPs).  ... 
arXiv:1603.09274v3 fatcat:75isdvgp5zbhplocook6273sq4

High-Performance Energy-Efficient Multicore Embedded Computing

A. Munir, S. Ranka, A. Gordon-Ross
2012 IEEE Transactions on Parallel and Distributed Systems  
With Moore's law supplying billions of transistors on-chip, embedded systems are undergoing a transition from single-core to multicore to exploit this high-transistor density for high performance.  ...  The increase in on-chip transistor density exacerbates power/thermal issues in embedded systems, which necessitates novel hardware/software power/thermal management techniques to meet the ever-increasing  ...  ACKNOWLEDGMENTS This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the US National Science Foundation (NSF) (CNS-0953447 and CNS-0905308).  ... 
doi:10.1109/tpds.2011.214 fatcat:vagqmojdsjevvc2u2ewqrcjjpq

Factored operating systems (fos)

David Wentzlaff, Anant Agarwal
2009 ACM SIGOPS Operating Systems Review  
The next decade will afford us computer chips with 100's to 1,000's of cores on a single piece of silicon.  ...  Contemporary operating systems have been designed to operate on a single core or small number of cores and hence are not well suited to manage and provide operating system services at such large scale.  ...  Acknowledgments This work is funded by DARPA, Quanta Computing, Google, and the NSF. We thank Robert Morris and Frans Kaashoek for feedback on this work.  ... 
doi:10.1145/1531793.1531805 fatcat:vdak4y4dt5cavlcqj7s7q4p3bu

A Survey of Coarse-Grained Reconfigurable Architecture and Design

Leibo Liu, Jianfeng Zhu, Zhaoshi Li, Yanan Lu, Yangdong Deng, Jie Han, Shouyi Yin, Shaojun Wei
2019 ACM Computing Surveys  
This article reviews the architecture and design of CGRAs thoroughly for the purpose of exploiting their full potential. First, a novel multidimensional taxonomy is proposed.  ...  As general-purpose processors have hit the power wall and chip fabrication cost escalates alarmingly, coarsegrained reconfigurable architectures (CGRAs) are attracting increasing interest from both academia  ...  Fig. 10 . 10 (a) Conventional processor system, (b) expanding the on-chip memory size, (c) PIM at memory interface, and (d) distributed PIM at memory array.  ... 
doi:10.1145/3357375 fatcat:pqi4d33i6bg45a6llswhwd44qi

Core-Selectability in Chip Multiprocessors

Hashem Hashemi Najaf-abadi, Niket Kumar Choudhary, Eric Rotenberg
2009 2009 18th International Conference on Parallel Architectures and Compilation Techniques  
The centralized structures necessary for the extraction of instruction-level parallelism (ILP) are consuming progressively smaller portions of the total die area of chip multiprocessors (CMP).  ...  In addition, it can provide significantly greater throughput to multiprogrammed workloads by providing the potential for the system to transform into a heterogeneous design.  ...  CCF-0811707, and funding from Intel and IBM.  ... 
doi:10.1109/pact.2009.44 dblp:conf/IEEEpact/Najaf-abadiCR09 fatcat:qvlgvgvhdrd5rkxh23xatavjv4

A case for FAME

Zhangxi Tan, Andrew Waterman, Henry Cook, Sarah Bird, Krste Asanović, David Patterson
2010 Proceedings of the 37th annual international symposium on Computer architecture - ISCA '10  
The study demonstrates FAME's capabilities: we run a modern parallel benchmark suite on a research operating system, simulate 64-core target architectures with multi-level memory hierarchy timing models  ...  To clear up misconceptions about FPGA-based simulation methodologies, we propose a FAME taxonomy to distinguish the costperformance of variations on these ideas.  ...  Thanks to the ROS implementers for their assistance on the RAMP Gold port, and to Kevin Klues in particular for providing page coloring support to facilitate our case study.  ... 
doi:10.1145/1815961.1815999 dblp:conf/isca/TanWCBAP10 fatcat:4u2ves3qn5ckdhgocyzfplx4z4

A case for FAME

Zhangxi Tan, Andrew Waterman, Henry Cook, Sarah Bird, Krste Asanović, David Patterson
2010 SIGARCH Computer Architecture News  
The study demonstrates FAME's capabilities: we run a modern parallel benchmark suite on a research operating system, simulate 64-core target architectures with multi-level memory hierarchy timing models  ...  To clear up misconceptions about FPGA-based simulation methodologies, we propose a FAME taxonomy to distinguish the costperformance of variations on these ideas.  ...  Thanks to the ROS implementers for their assistance on the RAMP Gold port, and to Kevin Klues in particular for providing page coloring support to facilitate our case study.  ... 
doi:10.1145/1816038.1815999 fatcat:v3bebnwebzdmzdutz4d345rzdq

Simplifying Active Memory Clusters by Leveraging Directory Protocol Threads

Dhiraj D. Kalamkar, Mainak Chaudhuri, Mark Heinrich
2007 2007 IEEE International Symposium on Performance Analysis of Systems & Software  
The proposed protocol extensions yield speedup of 1.45 for parallel reduction and 1.29 for matrix transpose on a 16-node DSM multiprocessor when compared to non-active memory baseline systems and achieve  ...  In this paper we make the important observation that on a traditional flexible distributed shared memory (DSM) multiprocessor node, equipped with a coherence protocol thread context as in SMTp or a simple  ...  (DSM) multiprocessor with nodes capable of running one or two application threads, thereby allowing us to experiment with 16-and 32-way threaded parallel applications.  ... 
doi:10.1109/ispass.2007.363754 dblp:conf/ispass/KalamkarCH07 fatcat:c4ycfckyozfnho2jxcbllzv7a4

Achieving Programming Model Abstractions for Reconfigurable Computing

D. Andrews, R. Sass, E. Anderson, J. Agron, W. Peck, J. Stevens, F. Baijot, E. Komp
2008 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
This enables all platform components to be abstracted into a unified multiprocessor architecture platform.  ...  Presently accepted hybrid CPU/FPGA computational models-and access to these computational models via high level languages-focus on programming language extensions to increase accessibility and portability  ...  Two hardware threads running in parallel show 7.1 speedup when compared to two software threads time-multiplexing on the CPU.  ... 
doi:10.1109/tvlsi.2007.912106 fatcat:qepiyqik7nh67df27xjddlzzcm
« Previous Showing results 1 — 15 out of 119 results