A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
In-Datacenter Performance Analysis of a Tensor Processing Unit
2017
Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17
This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). ...
The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. ...
The first four authors did the bulk of the evaluation in this paper, which is why they are in front, with the rest in alphabetical order. ...
doi:10.1145/3079856.3080246
dblp:conf/isca/JouppiYPPABBBBB17
fatcat:szzc6vdrjbb4pirhjitepszgai
In-Datacenter Performance Analysis of a Tensor Processing Unit
[article]
2017
arXiv
pre-print
This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). ...
The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. ...
The first four authors did the bulk of the evaluation in this paper, which is why they are in front, with the rest in alphabetical order. ...
arXiv:1704.04760v1
fatcat:btodsh4crratffycyq2frubd44
In-Datacenter Performance Analysis of a Tensor Processing Unit TM
2017
the 44th International Symposium on Computer Architecture (ISCA)
unpublished
This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU) -deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). ...
The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. ...
The first four authors did the bulk of the evaluation in this paper, which is why they are in front, with the rest in alphabetical order. ...
fatcat:ogmjbcgkeza5naaabzc7u5zziy
DeepCog: cognitive network management in sliced 5G Networks with deep learning
2019
Zenodo
Moreover, we leverage DeepCog to carry out an extensive first analysis of the trade-off between capacity overdimensioning and unserviced demands in adaptive, sliced networks and in presence of real-world ...
To this end, we present DeepCog, a novel data analytics tool for the cognitive management of resources in 5G systems. ...
ACKNOWLEDGMENTS The work of University Carlos III of Madrid was supported by the H2020 5G-MoNArch project (Grant Agreement No. 761445) and the work of NEC Laboratories Europe by the 5G-Transformer project ...
doi:10.5281/zenodo.3298949
fatcat:i6ro2zmccrhexovnkm7gj5z7aa
DeepCog: Cognitive Network Management in Sliced 5G Networks with Deep Learning
2019
IEEE INFOCOM 2019 - IEEE Conference on Computer Communications
Moreover, we leverage DeepCog to carry out an extensive first analysis of the trade-off between capacity overdimensioning and unserviced demands in adaptive, sliced networks and in presence of real-world ...
To this end, we present DeepCog, a novel data analytics tool for the cognitive management of resources in 5G systems. ...
ACKNOWLEDGMENTS The work of University Carlos III of Madrid was supported by the H2020 5G-MoNArch project (Grant Agreement No. 761445) and the work of NEC Laboratories Europe by the 5G-Transformer project ...
doi:10.1109/infocom.2019.8737488
dblp:conf/infocom/BegaGFBC19
fatcat:n626ap3ldrhntlzwcpcgk6u7ne
DeepCog: Optimizing Resource Provisioning in Network Slicing with AI-based Capacity Forecasting
2019
IEEE Journal on Selected Areas in Communications
Extensive performance evaluations with real-world measurement data collected in a metropolitan-scale operational mobile network demonstrate the effectiveness of our proposed solution, which can reduce ...
To close this gap, we present DeepCog, a deep neural network architecture inspired by advances in image processing and trained via a dedicated loss function. ...
ACKNOWLEDGMENTS The work of University Carlos III of Madrid was supported by H2020 5G-TOURS project (grant agreement no. 856950). ...
doi:10.1109/jsac.2019.2959245
fatcat:vjorfbql2ncp5i2ramhtguq2qe
Comparative Study on CPU, GPU and TPU
2020
International Journal of Computer Science and Information Technology for Education
Briefly explain the concept of each with architecture. Providing a comparative analysis of each unit. ...
In this paper, we are going to discuss the evolutionary and need for change in hardware such as central processing units, graphical processing units and tensor processing units. ...
(Image copied from NVIDIA_CUDA_Tuotorial_No_NDA_Aprs08.pdf)
Tensor Processing Units (TPU) A Tensor is an n-dimensional network. ...
doi:10.21742/ijcsite.2020.5.1.04
fatcat:julj3dpiwbhejfowx4cpbrnkm4
Microscope
2020
Proceedings of the 26th Annual International Conference on Mobile Computing and Networking
Microscope ( ) transforms traffic data collected in irregular radio access deployments in a format suitable for convolutional learning, and ( ) can accommodate a variety of neural network architectures ...
The growing diversification of mobile services imposes requirements on network performance that are ever more stringent and heterogeneous. ...
The authors are grateful for the reviewers' constructive feedback, and for the shepherd's guidance during the revision process. ...
doi:10.1145/3372224.3419195
dblp:conf/mobicom/ZhangFZP20
fatcat:x6u5yyjhrrfv5g7ur3z4zgtgva
Proximu: Efficiently Scaling DNN Inference in Multi-core CPUs through Near-Cache Compute
[article]
2020
arXiv
pre-print
Performance scales efficiently by distributing light-weight tensor compute near all caches in a multi-level cache hierarchy. ...
Across a number of DNN models, Proximu$ achieves a 2.3x increase in convolution performance/watt with a 2x to 3.94x scaling in raw performance. ...
We make the following key contributions in this paper. • We do a fundamental analysis of state-of-the-art implementations of multiple DNN-inference primitives executed on state-of-the-art datacenter CPUs ...
arXiv:2011.11695v2
fatcat:pvt7pv6euba6zb4qsm2iydsp7u
2019 Index IEEE Transactions on Services Computing Vol. 12
2020
IEEE Transactions on Services Computing
Zhao, Y.,
+, TSC May-June 2019 489-502
Graphics processing units
Fairness-Efficiency Allocation of CPU-GPU Heterogeneous Resources. ...
-Dec. 2019 896-909 Transaction processing A Proof-of-Trust Consensus Protocol for Enhancing Accountability in Crowdsourcing Services. ...
doi:10.1109/tsc.2020.2965435
fatcat:tim4rhxag5dqpbhvtgpwvp6ibm
Top Picks from the 2017 Computer Architecture Conferences
2018
IEEE Micro
I thank Benjamin Lee and Daniel Jiménez for handling articles with which I had a conflict of interest. ...
The community owes an enormous thanks to the entire selection committee, which diligently reviewed articles and endured the complexity of the ranking process despite poor support in the review software ...
In "Motivation for and Evaluation of the First Tensor Processing Unit," Norman Jouppi and colleagues describe the TPU architecture and its impact on performance in Google's production datacenters. ...
doi:10.1109/mm.2018.032271056
fatcat:jcs52oysenetto4iangvkkutty
Field-Programmable Deep Neural Network (DNN) Learning and Inference accelerator: a concept
[article]
2018
arXiv
pre-print
The accelerator described is able to reconfigure from (1) allocating all a DNN computations to a single worker in one extreme of sub-optimal performance to (2) optimally allocating workers per layer according ...
This speed-up is consequence of hiding the delay in transporting activation outputs from one layer to the next in a DNN behind the computations in the receiving layer. ...
Many storage drives distributed in a datacenter or distributed across several datacenters are equipped with a relatively small version of a FProg-DNN in each storage unit. ...
arXiv:1802.04899v4
fatcat:xngvmzmz6bavvglgqyi6yjkxvy
Energy and Policy Considerations for Modern Deep Learning Research
2020
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
, as a result of non-renewable energy used to fuel modern tensor processing hardware. ...
In a paper published this year at ACL, we brought this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training and tuning neural network models ...
Whereas a decade ago most AI research could be performed on a commodity desktop computer, modern deep learning research increasingly requires access to a cluster containing specialized tensor processing ...
doi:10.1609/aaai.v34i09.7123
fatcat:vcqpz3pf7zabfjwp6asxpigjs4
Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator
[article]
2018
arXiv
pre-print
A popular architecture for DNN acceleration, adopted by the Google Tensor Processing Unit (TPU), utilizes a systolic array based matrix multiplication unit at its core. ...
accuracy (as low as 0.1%) and no run-time performance overhead. ...
An example of a systolic array based DNN accelerator is the Google Tensor Processing Unit (TPU), that uses 256 ⇥ 256 grid of MAC units at its core, and provides between 30⇥ to 80⇥ times greater performance ...
arXiv:1802.04657v2
fatcat:mrr3fzd2wvh2vm4nmmel6tll4u
Adaptive Block Floating-Point for Analog Deep Learning Hardware
[article]
2022
arXiv
pre-print
We evaluate the effectiveness of ABFP on the DNNs in the MLPerf datacenter inference benchmark – realizing less than 1% loss in accuracy compared to FLOAT32. ...
We also introduce amplification (or gain) as a method for increasing the accuracy of the number representation without increasing the bit precision of the output. ...
Tiled Matrix-Multiplication with ABFP Figure 1 sketches the process of performing a tiled matrix multiplication with ABFP dot products. ...
arXiv:2205.06287v1
fatcat:njlfxn3c5zh2xgh2p5vv3fdrpy
« Previous
Showing results 1 — 15 out of 342 results