44,509 Hits in 5.3 sec

METRO: A Software-Hardware Co-Design of Interconnections for Spatial DNN Accelerators [article]

Zhao Wang, Guangyu Sun, Jingchen Zhu, Zhe Zhou, Yijiang Guo, Zhihang Yuan
2021 arXiv   pre-print
We evaluate the co-design using different flit sizes for synthetic study, illustrating its effectiveness under various hardware resource constraints, in addition to a wide range of DNN models selected  ...  In this work, we identify the inefficiency of the widely used traditional on-chip networks and the opportunity of software-hardware co-design.  ...  Dual-Phase Routing The routing methods allocate the physical forwarding path for every traffic flow on the spatial accelerator.  ... 
arXiv:2108.10570v1 fatcat:b5od3uzu7reknnrm6lx3ie35hy

Achieving Spectrum Efficient Communication Under Cross-Technology Interference [article]

Shuai Wang, Zhimeng Yin, Song Min Kim, Tian He
2017 arXiv   pre-print
The capability of direct communication among heterogeneous devices brings great opportunities to harmoniously sharing the spectrum with collaboration rather than competition.  ...  BACKGROUND A wide range of wireless technologies, such as WiFi, BlueTooth and ZigBee share the common wireless medium of the unlicensed 2.4GHz ISM band.  ...  The most widely used method now is the deployment of multiradio gateways a bridge for connecting them.  ... 
arXiv:1706.09922v1 fatcat:wlgurgs24jbp3m7jg6rgdzqxrq

Rubik: A Hierarchical Architecture for Efficient Graph Learning [article]

Xiaobing Chen, Yuke Wang, Xinfeng Xie, Xing Hu, Abanti Basak, Ling Liang, Mingyu Yan, Lei Deng, Yufei Ding, Zidong Du, Yunji Chen, Yuan Xie
2020 arXiv   pre-print
Such a hierarchical paradigm facilitates the software and hardware accelerations for GCN learning.  ...  We also propose a mapping methodology aware of data reuse and task-level parallelism to handle various graphs inputs effectively.  ...  Specifically, Rubik accelerator supports both spatial and temporal data flow for regular (nodelevel) and irregular (graph-level) computing, enhanced with both G-D cache and G-C cache for graph-level data  ... 
arXiv:2009.12495v1 fatcat:c7alktpjfjdzhbfmnsbwivv74a

The middlebox manifesto

Vyas Sekar, Sylvia Ratnasamy, Michael K. Reiter, Norbert Egi, Guangyu Shi
2011 Proceedings of the 10th ACM Workshop on Hot Topics in Networks - HotNets '11  
To this end, our vision is a world with software-centric middlebox implementations running on general-purpose hardware platforms that are managed via open and extensible management APIs.  ...  We make the case that enabling innovation in middleboxes is at least as important, if not more important, as that for traditional switches and routers.  ...  Acknowledgments We thank Neil Doran, Patrick Egan, Sridhar Mahankali, Sanjay Rungta, Daniel Tang, and Rob Wilson for sharing their insights and feedback.  ... 
doi:10.1145/2070562.2070583 dblp:conf/hotnets/SekarRRES11 fatcat:ubg7xaxa3bguzakghr2k6aateu

Characteristics of workloads used in high performance and technical computing

Razvan Cheveresan, Matt Ramsay, Chris Feucht, Ilya Sharapov
2007 Proceedings of the 21st annual international conference on Supercomputing - ICS '07  
For the selected workloads we provide a wide range of characterizations based on instruction tracing and hardware counter measurements.  ...  The results of this work show that the HPC application space is surprisingly diverse, with some codes showing similar data sharing and locality properties with commercial applications.  ...  Figure 5 : 5 Data Spatial Locality: This figure shows Figure 6 : 6 Data Spatial vs. Temporal Locality: This figure encapsulates both spatial and temporal locality scores for each benchmark.  ... 
doi:10.1145/1274971.1274984 dblp:conf/ics/CheveresanRFS07 fatcat:ptpam3kzxzcebp6jm3m3cahlaa

Towards On-Demand I/O Forwarding in HPC Platforms

Jean Luca Bez, Francieli Z. Boito, Alberto Miranda, Ramon Nou, Toni Cortes, Philippe O. A. Navaux
2020 2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)  
I/O forwarding is an established and widely-adopted technique in HPC to reduce contention and improve I/O performance in the access to shared storage infrastructure.  ...  We aim to explore when forwarding is the best choice for an application, how many I/O nodes it would benefit from, and whether not using forwarding at all might be the correct decision.  ...  The authors acknowledge the National Laboratory for Scientific Computing (LNCC/MCTI, Brazil) for providing HPC resources of the SDumont supercomputer, which have contributed to the research results reported  ... 
doi:10.1109/pdsw51947.2020.00007 fatcat:kyspvlotxvgndirwknuub4x6qe

Real-time Multi-Task Diffractive Deep Neural Networks via Hardware-Software Co-design [article]

Yingjie Li, Ruiyang Chen, Berardi Sensale Rodriguez, Weilu Gao, Cunxi Yu
2021 arXiv   pre-print
Our experimental results demonstrate significant improvements in versatility and hardware efficiency, and also demonstrate the robustness of proposed multi-task D^2NN architecture under wide noise ranges  ...  Recently, there are increasing efforts on optical neural networks and optical computing based DNNs hardware, which bring significant advantages for deep learning systems in terms of their power efficiency  ...  The forward function for i th task with detector noise is shown in Equation 8 . c i = argmax/argmin(det(f (θ share , θ i , X i )) + N (σ, 0)), i = {1, 2} (8) We also considered the imperfection of the  ... 
arXiv:2012.08906v2 fatcat:ebfavr52rjhzpi3oyz4tflsdq4

Author Index

2020 2020 IEEE 33rd International System-on-Chip Conference (SOCC)  
Accelerator for Audio and Visual Data Classification Gao, Jiabao TR4.2 207 FABLE-DTS: Hardware-Software Co-Design of a Fast and Stable Data Transmission System for FPGAs Garg, Mohit WR1.1  ...  Accelerator for Audio and Visual Data Classification Peng, Yarui FR6.2 277 Holistic 2.5D Design Flow: A 65Nm Shared- Block Microcontroller Case Study Poussier, Romain FS5.1 248 Secure Your  ... 
doi:10.1109/socc49529.2020.9524726 fatcat:qluzc5nlwbbyrf7iy5d4pnsrhy

On the mobile wireless access via MIMO relays

Tae Hyun Kim, Nitin H. Vaidya, Young-Bae Ko
2009 2009 IEEE 20th International Symposium on Personal, Indoor and Mobile Radio Communications  
While the introduced hardware complexity improves the throughput in low and moderate SNR regime, the throughput is improved by relay scheduling for spatial reuse in high SNR regime.  ...  This happens due to quick channel variation and its resulting protocol overheads that are especially large for multiuser multiple-input multiple-output (MIMO) systems.  ...  The time-sharing penalty for high SNR users is still expensive even with more antennas at the relay. • Spatial reuse gains: The MIMO relay gains so far come from the additional hardware complexity in the  ... 
doi:10.1109/pimrc.2009.5449979 dblp:conf/pimrc/0001VK09 fatcat:evywaxryujhipc3ajggnmq6eui

Reconfigurable Data Planes for Scalable Network Virtualization

Deepak Unnikrishnan, Ramakrishna Vadlamani, Yong Liao, Jeremie Crenne, Lixin Gao, Russell Tessier
2013 IEEE transactions on computers  
Our system implements forwarding tables in a shared fashion using inexpensive off-chip memories and supports both Internet Protocol (IP) and non-IP based data planes.  ...  hardware.  ...  The hardware data planes use an optimized hardware architecture that stores forwarding tables from multiple virtual data planes in a shared fashion using off-chip SRAM memories.  ... 
doi:10.1109/tc.2012.155 fatcat:sq7xdztilrdadniplvkrxrcwyq

A scalable learning system for video recognition

R. Porter, C. Chakrabarti, N. Harvey, G. Kenyon
2005 2005 IEEE Aerospace Conference  
We present an overview of the Harpo framework and describe a multilevel learning strategy used to optimize convolutional networks for particular features of interest in video data streams.  ...  Some of the most successful demonstrations of end-to-end learning have been with convolutional, or shared weight networks.  ...  The data consists of sequences of frames, each being 325 pixels wide by 256 pixels high with 3 colors and was recorded at 30 frames / second from a small (unstable) aircraft.  ... 
doi:10.1109/aero.2005.1559516 fatcat:faxqe7tzabe2zffolmjwlmfrem

Memory forwarding

Chi-Keung Luk, Todd C. Mowry
1999 SIGARCH Computer Architecture News  
By optimizing data layout at run-time, we can potentially enhance the performance of caches by actively creating spatial locality, facilitating prefetching, and avoiding cache conflicts and false sharing  ...  To overcome this limitation, we propose a technique called memory forwarding which effectively adds a new layer of indirection within the memory system whenever necessary to guarantee that data relocation  ...  Acknowledgments We thank Daniel Meneveaux for providing his radiosity program. Chi-Keung Luk is partially supported by a Canadian Commonwealth Fellowship. Todd C.  ... 
doi:10.1145/307338.300987 fatcat:33zcvqbsvvh3bkhbqu26za5rnq


Hongzhou Zhao, Arrvindh Shriraman, Snehasish Kumar, Sandhya Dwarkadas
2013 SIGARCH Computer Architecture News  
, but also results in unnecessary coherence traffic for shared data.  ...  In this paper, we present the design of Protozoa, a family of coherence protocols that eliminate unnecessary coherence traffic and match data movement to an application's spatial locality.  ...  Excess coherence traffic due to false sharing is a widely studied problem.  ... 
doi:10.1145/2508148.2485969 fatcat:dq6zxrakmjhhjgysyvke2t2rim


Hongzhou Zhao, Arrvindh Shriraman, Snehasish Kumar, Sandhya Dwarkadas
2013 Proceedings of the 40th Annual International Symposium on Computer Architecture - ISCA '13  
, but also results in unnecessary coherence traffic for shared data.  ...  In this paper, we present the design of Protozoa, a family of coherence protocols that eliminate unnecessary coherence traffic and match data movement to an application's spatial locality.  ...  Excess coherence traffic due to false sharing is a widely studied problem.  ... 
doi:10.1145/2485922.2485969 dblp:conf/isca/ZhaoSKD13 fatcat:ssdczij3pnfmlaenlf4a32ypzi

Mapping Scalable Video Coding decoder on multi-core stream processors

Yu-Chi Su, Sung-Fang Tsai, Tzu-Der Chuang, You-Ming Tsao, Liang-Gee Chen
2009 2009 Picture Coding Symposium  
SVC adopts layered coding techniques to improve coding efficiency for spatial and quality scalability.  ...  We focus on mapping issues of spatial scalability supporting with various resolutions of decoded frames.  ...  The upsampling mechanism (denoted as UP in Fig.3 ), which upsamples shared data from the reference layer for the next higher layer, is an important module in SVC decoder.  ... 
doi:10.1109/pcs.2009.5167370 dblp:conf/pcs/SuTCTC09 fatcat:mkp32ixzc5hrlowxqzw6zwnqna
« Previous Showing results 1 — 15 out of 44,509 results