A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Novel Cooperative Warp and Thread Block Scheduling Technique for Improving the GPGPU Resource Utilization
GPGPU 자원 활용 개선을 위한 블록 지연시간 기반 워프 스케줄링 기법
2017
KIPS Transactions on Computer and Communication Systems
GPGPU 자원 활용 개선을 위한 블록 지연시간 기반 워프 스케줄링 기법
when encountering a high contention degree at the memory and interconnection network. ...
One of the primary reasons is the inefficiency of existing warp/thread block schedulers in hiding long latency instructions, resulting in lost opportunity to improve the performance. ...
multiprocessors when encountering a high degree of contention (in terms of memory and interconnection network bandwidth). ...
doi:10.3745/ktccs.2017.6.5.219
fatcat:q6l2q3yt6bhnrbf4fjvwo5p56i
Architectural approach to the role of optics in monoprocessor and multiprocessor machines
2000
Applied Optics
This relation leaves the choice of the best network open in terms of simplicity and latency reduction. ...
The relevance of introducing optical interconnects ͑OI's͒ in monoprocessors and multiprocessors is studied from an architectural point of view. ...
This study partly summarizes the conclusions of the Workshop on Optical Communications and Computer Sciences ͑WOCCS͒ that was held in Toulouse, France, in March 1999. ...
doi:10.1364/ao.39.000671
pmid:18337941
fatcat:mowm54a6b5bidk3v5epnutx2a4
Shared Memory Multiprocessors
[chapter]
2004
Parallel Computing on Heterogeneous Networks
Interconnection networks The interconnection network design space has four dimensions: • Topology defines the physical interconnection structure of the network graph. ...
However, in a system based on an interconnection network, writes can be reordered by the network. ...
doi:10.1002/0471654167.ch3
fatcat:dvaj7kmetfgr7bkmdrmvzljwda
The Stanford Dash multiprocessor
1992
Computer
The design of the prototype has provided deeper insight into the architectural and implementation challenges that arise in a large-scale machine with a single address space. ...
The prototype will also serve as a platform for studying real applications and software on a large parallel system. ...
Acknowledgments This research was supported by D A R P A contracts N00014-87-K-0828 and N00039-91-C-0138. In addition. ...
doi:10.1109/2.121510
fatcat:3vidyyjlpncg3flegtqzfm76ky
MEmory performance
2007
SIGARCH Computer Architecture News
In particular, the problem of hiding/tolerating memory latencies is exacerbated by wire-delay and power consumptions issues. ...
In fact, it is the interaction between the static/dynamic features of the application and the system on which it executes that stresses the memory subsystem and pushes towards specific solutions. ...
network processors, in which Monchiero, Silvano et al. explore optimization techniques for synchronization mechanisms in MPSoCs relying on complex interconnection (Network-on-Chip), targeted to future ...
doi:10.1145/1327312.1327314
fatcat:lcl2hualvrcyhj3tancv7dpmyq
Memory performance
2006
SIGARCH Computer Architecture News
In particular, the problem of hiding/tolerating memory latencies is exacerbated by wire-delay and power consumptions issues. ...
In fact, it is the interaction between the static/dynamic features of the application and the system on which it executes that stresses the memory subsystem and pushes towards specific solutions. ...
network processors, in which Monchiero, Silvano et al. explore optimization techniques for synchronization mechanisms in MPSoCs relying on complex interconnection (Network-on-Chip), targeted to future ...
doi:10.1145/1147349.1147352
fatcat:ckimojzbhbav5g5spt3l5of4gy
Energy-Efficient Hardware Prefetching for CMPs Using Heterogeneous Interconnects
2010
2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing
Our proposal is based on the fact that the wires used in the on-chip interconnection network can be designed with varying latency, bandwidth and power characteristics. ...
On the other hand, CMP designs are likely to be equipped with latency hiding techniques like hardware prefetching in order to reduce the negative impact on performance that, otherwise, high cache miss ...
We believe that the use of an appropriately adjusted LA-PC could reduce the small degradation in execution time experienced by our proposal. ...
doi:10.1109/pdp.2010.12
dblp:conf/pdp/FloresAA10
fatcat:zghoeyy4yfgfrgimg6tjeutske
Trends in shared memory multiprocessing
1997
Computer
Concerns include the performance of the I/O subsystem (both on the network and disk sides) and reliability, availability, and serviceability. • Scientific and engineering. ...
The first step in meeting this challenge is to carefully examine the current use of shared memory multiprocessing and arrive at intelligent projections of future use based on application and technology ...
Acknowledgments We thank Yale Patt, who initiated the set of task forces that allowed us to develop our thoughts in a creative environment in Hawaii. ...
doi:10.1109/2.642814
fatcat:mhsgglxwfvdrtc4c4ap6eshxxa
An analysis of on-chip interconnection networks for large-scale chip multiprocessors
2010
ACM Transactions on Architecture and Code Optimization (TACO)
In this article, we explore the architectural-level implications of interconnection network design for CMPs with up to 128 fine-grain multithreaded cores. ...
We find that the interconnect has a large impact on performance, as it is responsible for 60% to 75% of the miss latency. ...
ACKNOWLEDGMENTS We sincerely thank Woongki Baek, Hari Kannan, Jacob Leverich, and the anonymous reviewers for their useful feedback on earlier versions of this manuscript. ...
doi:10.1145/1736065.1736069
fatcat:nbhnzmatgjbuzgffmji3wh6wey
DSM perspective: another point of view
1999
Proceedings of the IEEE
Gray for his interaction and tireless editing of two of the drafts. ...
ACKNOWLEDGMENT The authors would like to thank the editors and reviewers who stimulated them to solidify and clarify their position so as to present DSM in a less biased light. ...
Commercial performance depends on record throughput per second; disk access latency often hides computing or messaging latency. ...
doi:10.1109/5.747862
fatcat:47y3mush5jestl2gklmgdtghwi
The directory-based cache coherence protocol for the DASH multiprocessor
1990
SIGARCH Computer Architecture News
Unlike traditional snoopy coherence protocols, the DASH protocol does not rely on broadcast; instead it uses point-to-point messages sent between the processors and memories to keep caches consistent. ...
In this paper, we present the design of the DASH coherence protocol and discuss how it addresses the above issues, We also discuss our strategy for verifying the correctness of the protocol and briefly ...
In particular, we would like to thank Wolf-Diehich Weber for creating the DASH simulator, Helen Davis and Stephen Goldschmidt for modifying their Tango simulator to interact with the DASH simulator, and ...
doi:10.1145/325096.325132
fatcat:gottedibh5hu7m3qufd44f6jmm
PicoServer
2006
SIGPLAN notices
The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. ...
In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. ...
This project is supported by the National Science Foundation under grants NSF-ITR CCR-0325898 and CCR-0219640. ...
doi:10.1145/1168918.1168873
fatcat:lwsq2dsbxfe7fgpak3wajsbrgu
PicoServer
2006
SIGARCH Computer Architecture News
The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. ...
In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. ...
This project is supported by the National Science Foundation under grants NSF-ITR CCR-0325898 and CCR-0219640. ...
doi:10.1145/1168919.1168873
fatcat:5caghuwhojfd5p4kdhaq5qckiu
PicoServer
2006
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems - ASPLOS-XII
The 3D technology also enables wide low-latency buses between processors and memory. These remove the need for an L2 cache allowing its area to be re-allocated to additional simple cores. ...
In this paper, we show how 3D stacking technology can be used to implement a simple, low-power, high-performance chip multiprocessor suitable for throughput processing. ...
This project is supported by the National Science Foundation under grants NSF-ITR CCR-0325898 and CCR-0219640. ...
doi:10.1145/1168857.1168873
dblp:conf/asplos/KgilDSBDMRF06
fatcat:didbkyujwfddjoviyhj6huxxum
Interconnections in Multi-Core Architectures
2005
SIGARCH Computer Architecture News
This paper examines the area, power, performance, and design issues for the on-chip interconnects on a chip multiprocessor, attempting to present a comprehensive view of a class of interconnect architectures ...
It shows that the design choices for the interconnect have significant effect on the rest of the chip, potentially consuming a significant fraction of the real estate and power budget. ...
This work was supported in part by NSF Grant No. CCR-0311683 and an IBM internship. ...
doi:10.1145/1080695.1070004
fatcat:nm44nhgbbzhkvdhj63vut6elyu
« Previous
Showing results 1 — 15 out of 1,233 results