Filters








1,233 Hits in 3.9 sec

A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization
GPGPU 자원 활용 개선을 위한 블록 지연시간 기반 워프 스케줄링 기법

Do Cong Thuan, Yong Choi, Jong Myon Kim, Cheol Hong Kim
2017 KIPS Transactions on Computer and Communication Systems  
when encountering a high contention degree at the memory and interconnection network.  ...  One of the primary reasons is the inefficiency of existing warp/thread block schedulers in hiding long latency instructions, resulting in lost opportunity to improve the performance.  ...  multiprocessors when encountering a high degree of contention (in terms of memory and interconnection network bandwidth).  ... 
doi:10.3745/ktccs.2017.6.5.219 fatcat:q6l2q3yt6bhnrbf4fjvwo5p56i

Architectural approach to the role of optics in monoprocessor and multiprocessor machines

Jacques Henri Collet, Daniel Litaize, Jan Van Campenhout, Chris Jesshope, Marc Desmulliez, Hugo Thienpont, James Goodman, Ahmed Louri
2000 Applied Optics  
This relation leaves the choice of the best network open in terms of simplicity and latency reduction.  ...  The relevance of introducing optical interconnects ͑OI's͒ in monoprocessors and multiprocessors is studied from an architectural point of view.  ...  This study partly summarizes the conclusions of the Workshop on Optical Communications and Computer Sciences ͑WOCCS͒ that was held in Toulouse, France, in March 1999.  ... 
doi:10.1364/ao.39.000671 pmid:18337941 fatcat:mowm54a6b5bidk3v5epnutx2a4

Shared Memory Multiprocessors [chapter]

2004 Parallel Computing on Heterogeneous Networks  
Interconnection networks The interconnection network design space has four dimensions: • Topology defines the physical interconnection structure of the network graph.  ...  However, in a system based on an interconnection network, writes can be reordered by the network.  ... 
doi:10.1002/0471654167.ch3 fatcat:dvaj7kmetfgr7bkmdrmvzljwda

The Stanford Dash multiprocessor

D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, M.S. Lam
1992 Computer  
The design of the prototype has provided deeper insight into the architectural and implementation challenges that arise in a large-scale machine with a single address space.  ...  The prototype will also serve as a platform for studying real applications and software on a large parallel system.  ...  Acknowledgments This research was supported by D A R P A contracts N00014-87-K-0828 and N00039-91-C-0138. In addition.  ... 
doi:10.1109/2.121510 fatcat:3vidyyjlpncg3flegtqzfm76ky

MEmory performance

S. Bartolini, P. Foglia, C. A. Prete
2007 SIGARCH Computer Architecture News  
In particular, the problem of hiding/tolerating memory latencies is exacerbated by wire-delay and power consumptions issues.  ...  In fact, it is the interaction between the static/dynamic features of the application and the system on which it executes that stresses the memory subsystem and pushes towards specific solutions.  ...  network processors, in which Monchiero, Silvano et al. explore optimization techniques for synchronization mechanisms in MPSoCs relying on complex interconnection (Network-on-Chip), targeted to future  ... 
doi:10.1145/1327312.1327314 fatcat:lcl2hualvrcyhj3tancv7dpmyq

Memory performance

S. Bartolini, P. Foglia, R. Giorgi, C. A. Prete
2006 SIGARCH Computer Architecture News  
In particular, the problem of hiding/tolerating memory latencies is exacerbated by wire-delay and power consumptions issues.  ...  In fact, it is the interaction between the static/dynamic features of the application and the system on which it executes that stresses the memory subsystem and pushes towards specific solutions.  ...  network processors, in which Monchiero, Silvano et al. explore optimization techniques for synchronization mechanisms in MPSoCs relying on complex interconnection (Network-on-Chip), targeted to future  ... 
doi:10.1145/1147349.1147352 fatcat:ckimojzbhbav5g5spt3l5of4gy

Energy-Efficient Hardware Prefetching for CMPs Using Heterogeneous Interconnects

Antonio Flores, Juan L. Aragón, Manuel E. Acacio
2010 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing  
Our proposal is based on the fact that the wires used in the on-chip interconnection network can be designed with varying latency, bandwidth and power characteristics.  ...  On the other hand, CMP designs are likely to be equipped with latency hiding techniques like hardware prefetching in order to reduce the negative impact on performance that, otherwise, high cache miss  ...  We believe that the use of an appropriately adjusted LA-PC could reduce the small degradation in execution time experienced by our proposal.  ... 
doi:10.1109/pdp.2010.12 dblp:conf/pdp/FloresAA10 fatcat:zghoeyy4yfgfrgimg6tjeutske

Trends in shared memory multiprocessing

P. Stenstrom, E. Hagersten, D.J. Lilja, M. Martonosi, M. Venugopal
1997 Computer  
Concerns include the performance of the I/O subsystem (both on the network and disk sides) and reliability, availability, and serviceability. • Scientific and engineering.  ...  The first step in meeting this challenge is to carefully examine the current use of shared memory multiprocessing and arrive at intelligent projections of future use based on application and technology  ...  Acknowledgments We thank Yale Patt, who initiated the set of task forces that allowed us to develop our thoughts in a creative environment in Hawaii.  ... 
doi:10.1109/2.642814 fatcat:mhsgglxwfvdrtc4c4ap6eshxxa

An analysis of on-chip interconnection networks for large-scale chip multiprocessors

Daniel Sanchez, George Michelogiannakis, Christos Kozyrakis
2010 ACM Transactions on Architecture and Code Optimization (TACO)  
In this article, we explore the architectural-level implications of interconnection network design for CMPs with up to 128 fine-grain multithreaded cores.  ...  We find that the interconnect has a large impact on performance, as it is responsible for 60% to 75% of the miss latency.  ...  ACKNOWLEDGMENTS We sincerely thank Woongki Baek, Hari Kannan, Jacob Leverich, and the anonymous reviewers for their useful feedback on earlier versions of this manuscript.  ... 
doi:10.1145/1736065.1736069 fatcat:nbhnzmatgjbuzgffmji3wh6wey

DSM perspective: another point of view

G. Bell, C. van Ingen
1999 Proceedings of the IEEE  
Gray for his interaction and tireless editing of two of the drafts.  ...  ACKNOWLEDGMENT The authors would like to thank the editors and reviewers who stimulated them to solidify and clarify their position so as to present DSM in a less biased light.  ...  Commercial performance depends on record throughput per second; disk access latency often hides computing or messaging latency.  ... 
doi:10.1109/5.747862 fatcat:47y3mush5jestl2gklmgdtghwi

The directory-based cache coherence protocol for the DASH multiprocessor

Daniel Lenoski, James Laudon, Kourosh Gharachorloo, Anoop Gupta, John Hennessy
1990 SIGARCH Computer Architecture News  
Unlike traditional snoopy coherence protocols, the DASH protocol does not rely on broadcast; instead it uses point-to-point messages sent between the processors and memories to keep caches consistent.  ...  In this paper, we present the design of the DASH coherence protocol and discuss how it addresses the above issues, We also discuss our strategy for verifying the correctness of the protocol and briefly  ...  In particular, we would like to thank Wolf-Diehich Weber for creating the DASH simulator, Helen Davis and Stephen Goldschmidt for modifying their Tango simulator to interact with the DASH simulator, and  ... 
doi:10.1145/325096.325132 fatcat:gottedibh5hu7m3qufd44f6jmm

PicoServer

Taeho Kgil, Shaun D'Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Trevor Mudge, Steven Reinhardt, Krisztian Flautner
2006 SIGPLAN notices  
The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores.  ...  In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing.  ...  This project is supported by the National Science Foundation under grants NSF-ITR CCR-0325898 and CCR-0219640.  ... 
doi:10.1145/1168918.1168873 fatcat:lwsq2dsbxfe7fgpak3wajsbrgu

PicoServer

Taeho Kgil, Shaun D'Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Trevor Mudge, Steven Reinhardt, Krisztian Flautner
2006 SIGARCH Computer Architecture News  
The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores.  ...  In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing.  ...  This project is supported by the National Science Foundation under grants NSF-ITR CCR-0325898 and CCR-0219640.  ... 
doi:10.1145/1168919.1168873 fatcat:5caghuwhojfd5p4kdhaq5qckiu

PicoServer

Taeho Kgil, Shaun D'Souza, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Trevor Mudge, Steven Reinhardt, Krisztian Flautner
2006 Proceedings of the 12th international conference on Architectural support for programming languages and operating systems - ASPLOS-XII  
The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores.  ...  In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing.  ...  This project is supported by the National Science Foundation under grants NSF-ITR CCR-0325898 and CCR-0219640.  ... 
doi:10.1145/1168857.1168873 dblp:conf/asplos/KgilDSBDMRF06 fatcat:didbkyujwfddjoviyhj6huxxum

Interconnections in Multi-Core Architectures

Rakesh Kumar, Victor Zyuban, Dean M. Tullsen
2005 SIGARCH Computer Architecture News  
This paper examines the area, power, performance, and design issues for the on-chip interconnects on a chip multiprocessor, attempting to present a comprehensive view of a class of interconnect architectures  ...  It shows that the design choices for the interconnect have significant effect on the rest of the chip, potentially consuming a significant fraction of the real estate and power budget.  ...  This work was supported in part by NSF Grant No. CCR-0311683 and an IBM internship.  ... 
doi:10.1145/1080695.1070004 fatcat:nm44nhgbbzhkvdhj63vut6elyu
« Previous Showing results 1 — 15 out of 1,233 results