A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
METRO: A Software-Hardware Co-Design of Interconnections for Spatial DNN Accelerators
[article]
2021
arXiv
pre-print
We evaluate the co-design using different flit sizes for synthetic study, illustrating its effectiveness under various hardware resource constraints, in addition to a wide range of DNN models selected ...
In this work, we identify the inefficiency of the widely used traditional on-chip networks and the opportunity of software-hardware co-design. ...
Dual-Phase Routing The routing methods allocate the physical forwarding path for every traffic flow on the spatial accelerator. ...
arXiv:2108.10570v1
fatcat:b5od3uzu7reknnrm6lx3ie35hy
Achieving Spectrum Efficient Communication Under Cross-Technology Interference
[article]
2017
arXiv
pre-print
The capability of direct communication among heterogeneous devices brings great opportunities to harmoniously sharing the spectrum with collaboration rather than competition. ...
BACKGROUND A wide range of wireless technologies, such as WiFi, BlueTooth and ZigBee share the common wireless medium of the unlicensed 2.4GHz ISM band. ...
The most widely used method now is the deployment of multiradio gateways a bridge for connecting them. ...
arXiv:1706.09922v1
fatcat:wlgurgs24jbp3m7jg6rgdzqxrq
Rubik: A Hierarchical Architecture for Efficient Graph Learning
[article]
2020
arXiv
pre-print
Such a hierarchical paradigm facilitates the software and hardware accelerations for GCN learning. ...
We also propose a mapping methodology aware of data reuse and task-level parallelism to handle various graphs inputs effectively. ...
Specifically, Rubik accelerator supports both spatial and temporal data flow for regular (nodelevel) and irregular (graph-level) computing, enhanced with both G-D cache and G-C cache for graph-level data ...
arXiv:2009.12495v1
fatcat:c7alktpjfjdzhbfmnsbwivv74a
The middlebox manifesto
2011
Proceedings of the 10th ACM Workshop on Hot Topics in Networks - HotNets '11
To this end, our vision is a world with software-centric middlebox implementations running on general-purpose hardware platforms that are managed via open and extensible management APIs. ...
We make the case that enabling innovation in middleboxes is at least as important, if not more important, as that for traditional switches and routers. ...
Acknowledgments We thank Neil Doran, Patrick Egan, Sridhar Mahankali, Sanjay Rungta, Daniel Tang, and Rob Wilson for sharing their insights and feedback. ...
doi:10.1145/2070562.2070583
dblp:conf/hotnets/SekarRRES11
fatcat:ubg7xaxa3bguzakghr2k6aateu
Characteristics of workloads used in high performance and technical computing
2007
Proceedings of the 21st annual international conference on Supercomputing - ICS '07
For the selected workloads we provide a wide range of characterizations based on instruction tracing and hardware counter measurements. ...
The results of this work show that the HPC application space is surprisingly diverse, with some codes showing similar data sharing and locality properties with commercial applications. ...
Figure 5 : 5 Data Spatial Locality: This figure shows
Figure 6 : 6 Data Spatial vs. Temporal Locality: This figure encapsulates both spatial and temporal locality scores for each benchmark. ...
doi:10.1145/1274971.1274984
dblp:conf/ics/CheveresanRFS07
fatcat:ptpam3kzxzcebp6jm3m3cahlaa
Towards On-Demand I/O Forwarding in HPC Platforms
2020
2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)
I/O forwarding is an established and widely-adopted technique in HPC to reduce contention and improve I/O performance in the access to shared storage infrastructure. ...
We aim to explore when forwarding is the best choice for an application, how many I/O nodes it would benefit from, and whether not using forwarding at all might be the correct decision. ...
The authors acknowledge the National Laboratory for Scientific Computing (LNCC/MCTI, Brazil) for providing HPC resources of the SDumont supercomputer, which have contributed to the research results reported ...
doi:10.1109/pdsw51947.2020.00007
fatcat:kyspvlotxvgndirwknuub4x6qe
Real-time Multi-Task Diffractive Deep Neural Networks via Hardware-Software Co-design
[article]
2021
arXiv
pre-print
Our experimental results demonstrate significant improvements in versatility and hardware efficiency, and also demonstrate the robustness of proposed multi-task D^2NN architecture under wide noise ranges ...
Recently, there are increasing efforts on optical neural networks and optical computing based DNNs hardware, which bring significant advantages for deep learning systems in terms of their power efficiency ...
The forward function for i th task with detector noise is shown in Equation 8 . c i = argmax/argmin(det(f (θ share , θ i , X i )) + N (σ, 0)), i = {1, 2} (8) We also considered the imperfection of the ...
arXiv:2012.08906v2
fatcat:ebfavr52rjhzpi3oyz4tflsdq4
Author Index
2020
2020 IEEE 33rd International System-on-Chip Conference (SOCC)
Accelerator for Audio
and Visual Data Classification
Gao, Jiabao
TR4.2
207
FABLE-DTS: Hardware-Software Co-Design
of a Fast and Stable Data Transmission
System for FPGAs
Garg, Mohit
WR1.1 ...
Accelerator for Audio
and Visual Data Classification
Peng, Yarui
FR6.2
277
Holistic 2.5D Design Flow: A 65Nm Shared-
Block Microcontroller Case Study
Poussier, Romain
FS5.1
248
Secure Your ...
doi:10.1109/socc49529.2020.9524726
fatcat:qluzc5nlwbbyrf7iy5d4pnsrhy
On the mobile wireless access via MIMO relays
2009
2009 IEEE 20th International Symposium on Personal, Indoor and Mobile Radio Communications
While the introduced hardware complexity improves the throughput in low and moderate SNR regime, the throughput is improved by relay scheduling for spatial reuse in high SNR regime. ...
This happens due to quick channel variation and its resulting protocol overheads that are especially large for multiuser multiple-input multiple-output (MIMO) systems. ...
The time-sharing penalty for high SNR users is still expensive even with more antennas at the relay. • Spatial reuse gains: The MIMO relay gains so far come from the additional hardware complexity in the ...
doi:10.1109/pimrc.2009.5449979
dblp:conf/pimrc/0001VK09
fatcat:evywaxryujhipc3ajggnmq6eui
Reconfigurable Data Planes for Scalable Network Virtualization
2013
IEEE transactions on computers
Our system implements forwarding tables in a shared fashion using inexpensive off-chip memories and supports both Internet Protocol (IP) and non-IP based data planes. ...
hardware. ...
The hardware data planes use an optimized hardware architecture that stores forwarding tables from multiple virtual data planes in a shared fashion using off-chip SRAM memories. ...
doi:10.1109/tc.2012.155
fatcat:sq7xdztilrdadniplvkrxrcwyq
A scalable learning system for video recognition
2005
2005 IEEE Aerospace Conference
We present an overview of the Harpo framework and describe a multilevel learning strategy used to optimize convolutional networks for particular features of interest in video data streams. ...
Some of the most successful demonstrations of end-to-end learning have been with convolutional, or shared weight networks. ...
The data consists of sequences of frames, each being 325 pixels wide by 256 pixels high with 3 colors and was recorded at 30 frames / second from a small (unstable) aircraft. ...
doi:10.1109/aero.2005.1559516
fatcat:faxqe7tzabe2zffolmjwlmfrem
Memory forwarding
1999
SIGARCH Computer Architecture News
By optimizing data layout at run-time, we can potentially enhance the performance of caches by actively creating spatial locality, facilitating prefetching, and avoiding cache conflicts and false sharing ...
To overcome this limitation, we propose a technique called memory forwarding which effectively adds a new layer of indirection within the memory system whenever necessary to guarantee that data relocation ...
Acknowledgments We thank Daniel Meneveaux for providing his radiosity program. Chi-Keung Luk is partially supported by a Canadian Commonwealth Fellowship. Todd C. ...
doi:10.1145/307338.300987
fatcat:33zcvqbsvvh3bkhbqu26za5rnq
Protozoa
2013
SIGARCH Computer Architecture News
, but also results in unnecessary coherence traffic for shared data. ...
In this paper, we present the design of Protozoa, a family of coherence protocols that eliminate unnecessary coherence traffic and match data movement to an application's spatial locality. ...
Excess coherence traffic due to false sharing is a widely studied problem. ...
doi:10.1145/2508148.2485969
fatcat:dq6zxrakmjhhjgysyvke2t2rim
Protozoa
2013
Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13
, but also results in unnecessary coherence traffic for shared data. ...
In this paper, we present the design of Protozoa, a family of coherence protocols that eliminate unnecessary coherence traffic and match data movement to an application's spatial locality. ...
Excess coherence traffic due to false sharing is a widely studied problem. ...
doi:10.1145/2485922.2485969
dblp:conf/isca/ZhaoSKD13
fatcat:ssdczij3pnfmlaenlf4a32ypzi
Mapping Scalable Video Coding decoder on multi-core stream processors
2009
2009 Picture Coding Symposium
SVC adopts layered coding techniques to improve coding efficiency for spatial and quality scalability. ...
We focus on mapping issues of spatial scalability supporting with various resolutions of decoded frames. ...
The upsampling mechanism (denoted as UP in Fig.3 ), which upsamples shared data from the reference layer for the next higher layer, is an important module in SVC decoder. ...
doi:10.1109/pcs.2009.5167370
dblp:conf/pcs/SuTCTC09
fatcat:mkp32ixzc5hrlowxqzw6zwnqna
« Previous
Showing results 1 — 15 out of 44,509 results