Filters








342 Hits in 2.4 sec

In-Datacenter Performance Analysis of a Tensor Processing Unit

Norman P. Jouppi, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Cliff Young (+64 others)
2017 Proceedings of the 44th Annual International Symposium on Computer Architecture - ISCA '17  
This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN).  ...  The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory.  ...  The first four authors did the bulk of the evaluation in this paper, which is why they are in front, with the rest in alphabetical order.  ... 
doi:10.1145/3079856.3080246 dblp:conf/isca/JouppiYPPABBBBB17 fatcat:szzc6vdrjbb4pirhjitepszgai

In-Datacenter Performance Analysis of a Tensor Processing Unit [article]

Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao (+59 others)
2017 arXiv   pre-print
This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN).  ...  The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory.  ...  The first four authors did the bulk of the evaluation in this paper, which is why they are in front, with the rest in alphabetical order.  ... 
arXiv:1704.04760v1 fatcat:btodsh4crratffycyq2frubd44

In-Datacenter Performance Analysis of a Tensor Processing Unit​ TM

Norman Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-Luc Cantin (+63 others)
2017 the 44th International Symposium on Computer Architecture (ISCA)   unpublished
This paper evaluates a custom ASIC-called aTensor Processing Unit (TPU) ​-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN).  ...  The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory.  ...  The first four authors did the bulk of the evaluation in this paper, which is why they are in front, with the rest in alphabetical order.  ... 
fatcat:ogmjbcgkeza5naaabzc7u5zziy

DeepCog: cognitive network management in sliced 5G Networks with deep learning

Dario Bega, Marco Gramaglia, Marco Fiore, Albert Banchs, Xavier Costa-Perez
2019 Zenodo  
Moreover, we leverage DeepCog to carry out an extensive first analysis of the trade-off between capacity overdimensioning and unserviced demands in adaptive, sliced networks and in presence of real-world  ...  To this end, we present DeepCog, a novel data analytics tool for the cognitive management of resources in 5G systems.  ...  ACKNOWLEDGMENTS The work of University Carlos III of Madrid was supported by the H2020 5G-MoNArch project (Grant Agreement No. 761445) and the work of NEC Laboratories Europe by the 5G-Transformer project  ... 
doi:10.5281/zenodo.3298949 fatcat:i6ro2zmccrhexovnkm7gj5z7aa

DeepCog: Cognitive Network Management in Sliced 5G Networks with Deep Learning

Dario Bega, Marco Gramaglia, Marco Fiore, Albert Banchs, Xavier Costa-Perez
2019 IEEE INFOCOM 2019 - IEEE Conference on Computer Communications  
Moreover, we leverage DeepCog to carry out an extensive first analysis of the trade-off between capacity overdimensioning and unserviced demands in adaptive, sliced networks and in presence of real-world  ...  To this end, we present DeepCog, a novel data analytics tool for the cognitive management of resources in 5G systems.  ...  ACKNOWLEDGMENTS The work of University Carlos III of Madrid was supported by the H2020 5G-MoNArch project (Grant Agreement No. 761445) and the work of NEC Laboratories Europe by the 5G-Transformer project  ... 
doi:10.1109/infocom.2019.8737488 dblp:conf/infocom/BegaGFBC19 fatcat:n626ap3ldrhntlzwcpcgk6u7ne

DeepCog: Optimizing Resource Provisioning in Network Slicing with AI-based Capacity Forecasting

Dario Bega, Marco Gramaglia, Marco Fiore, Albert Banchs, Xavier Costa-Perez
2019 IEEE Journal on Selected Areas in Communications  
Extensive performance evaluations with real-world measurement data collected in a metropolitan-scale operational mobile network demonstrate the effectiveness of our proposed solution, which can reduce  ...  To close this gap, we present DeepCog, a deep neural network architecture inspired by advances in image processing and trained via a dedicated loss function.  ...  ACKNOWLEDGMENTS The work of University Carlos III of Madrid was supported by H2020 5G-TOURS project (grant agreement no. 856950).  ... 
doi:10.1109/jsac.2019.2959245 fatcat:vjorfbql2ncp5i2ramhtguq2qe

Comparative Study on CPU, GPU and TPU

P Siva Raj, Dept. of Comp. Science Engineering, Vignan's IIT (Autonomous), AP, India, Ch. Sekhar, Dept. of Comp. Science Engineering, Vignan's IIT (Autonomous), AP, India
2020 International Journal of Computer Science and Information Technology for Education  
Briefly explain the concept of each with architecture. Providing a comparative analysis of each unit.  ...  In this paper, we are going to discuss the evolutionary and need for change in hardware such as central processing units, graphical processing units and tensor processing units.  ...  (Image copied from NVIDIA_CUDA_Tuotorial_No_NDA_Aprs08.pdf) Tensor Processing Units (TPU) A Tensor is an n-dimensional network.  ... 
doi:10.21742/ijcsite.2020.5.1.04 fatcat:julj3dpiwbhejfowx4cpbrnkm4

Microscope

Chaoyun Zhang, Marco Fiore, Cezary Ziemlicki, Paul Patras
2020 Proceedings of the 26th Annual International Conference on Mobile Computing and Networking  
Microscope ( ) transforms traffic data collected in irregular radio access deployments in a format suitable for convolutional learning, and ( ) can accommodate a variety of neural network architectures  ...  The growing diversification of mobile services imposes requirements on network performance that are ever more stringent and heterogeneous.  ...  The authors are grateful for the reviewers' constructive feedback, and for the shepherd's guidance during the revision process.  ... 
doi:10.1145/3372224.3419195 dblp:conf/mobicom/ZhangFZP20 fatcat:x6u5yyjhrrfv5g7ur3z4zgtgva

Proximu: Efficiently Scaling DNN Inference in Multi-core CPUs through Near-Cache Compute [article]

Anant V. Nori, Rahul Bera, Shankar Balachandran, Joydeep Rakshit, Om J. Omer, Avishaii Abuhatzera, Belliappa Kuttanna, Sreenivas Subramoney
2020 arXiv   pre-print
Performance scales efficiently by distributing light-weight tensor compute near all caches in a multi-level cache hierarchy.  ...  Across a number of DNN models, Proximu$ achieves a 2.3x increase in convolution performance/watt with a 2x to 3.94x scaling in raw performance.  ...  We make the following key contributions in this paper. • We do a fundamental analysis of state-of-the-art implementations of multiple DNN-inference primitives executed on state-of-the-art datacenter CPUs  ... 
arXiv:2011.11695v2 fatcat:pvt7pv6euba6zb4qsm2iydsp7u

2019 Index IEEE Transactions on Services Computing Vol. 12

2020 IEEE Transactions on Services Computing  
Zhao, Y., +, TSC May-June 2019 489-502 Graphics processing units Fairness-Efficiency Allocation of CPU-GPU Heterogeneous Resources.  ...  -Dec. 2019 896-909 Transaction processing A Proof-of-Trust Consensus Protocol for Enhancing Accountability in Crowdsourcing Services.  ... 
doi:10.1109/tsc.2020.2965435 fatcat:tim4rhxag5dqpbhvtgpwvp6ibm

Top Picks from the 2017 Computer Architecture Conferences

Thomas F. Wenisch
2018 IEEE Micro  
I thank Benjamin Lee and Daniel Jiménez for handling articles with which I had a conflict of interest.  ...  The community owes an enormous thanks to the entire selection committee, which diligently reviewed articles and endured the complexity of the ranking process despite poor support in the review software  ...  In "Motivation for and Evaluation of the First Tensor Processing Unit," Norman Jouppi and colleagues describe the TPU architecture and its impact on performance in Google's production datacenters.  ... 
doi:10.1109/mm.2018.032271056 fatcat:jcs52oysenetto4iangvkkutty

Field-Programmable Deep Neural Network (DNN) Learning and Inference accelerator: a concept [article]

Luiz M Franca-Neto
2018 arXiv   pre-print
The accelerator described is able to reconfigure from (1) allocating all a DNN computations to a single worker in one extreme of sub-optimal performance to (2) optimally allocating workers per layer according  ...  This speed-up is consequence of hiding the delay in transporting activation outputs from one layer to the next in a DNN behind the computations in the receiving layer.  ...  Many storage drives distributed in a datacenter or distributed across several datacenters are equipped with a relatively small version of a FProg-DNN in each storage unit.  ... 
arXiv:1802.04899v4 fatcat:xngvmzmz6bavvglgqyi6yjkxvy

Energy and Policy Considerations for Modern Deep Learning Research

Emma Strubell, Ananya Ganesh, Andrew McCallum
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
, as a result of non-renewable energy used to fuel modern tensor processing hardware.  ...  In a paper published this year at ACL, we brought this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training and tuning neural network models  ...  Whereas a decade ago most AI research could be performed on a commodity desktop computer, modern deep learning research increasingly requires access to a cluster containing specialized tensor processing  ... 
doi:10.1609/aaai.v34i09.7123 fatcat:vcqpz3pf7zabfjwp6asxpigjs4

Analyzing and Mitigating the Impact of Permanent Faults on a Systolic Array Based Neural Network Accelerator [article]

Jeff Zhang, Tianyu Gu, Kanad Basu, Siddharth Garg
2018 arXiv   pre-print
A popular architecture for DNN acceleration, adopted by the Google Tensor Processing Unit (TPU), utilizes a systolic array based matrix multiplication unit at its core.  ...  accuracy (as low as 0.1%) and no run-time performance overhead.  ...  An example of a systolic array based DNN accelerator is the Google Tensor Processing Unit (TPU), that uses 256 ⇥ 256 grid of MAC units at its core, and provides between 30⇥ to 80⇥ times greater performance  ... 
arXiv:1802.04657v2 fatcat:mrr3fzd2wvh2vm4nmmel6tll4u

Adaptive Block Floating-Point for Analog Deep Learning Hardware [article]

Ayon Basumallik, Darius Bunandar, Nicholas Dronen, Nicholas Harris, Ludmila Levkova, Calvin McCarter, Lakshmi Nair, David Walter, David Widemann
2022 arXiv   pre-print
We evaluate the effectiveness of ABFP on the DNNs in the MLPerf datacenter inference benchmark – realizing less than 1% loss in accuracy compared to FLOAT32.  ...  We also introduce amplification (or gain) as a method for increasing the accuracy of the number representation without increasing the bit precision of the output.  ...  Tiled Matrix-Multiplication with ABFP Figure 1 sketches the process of performing a tiled matrix multiplication with ABFP dot products.  ... 
arXiv:2205.06287v1 fatcat:njlfxn3c5zh2xgh2p5vv3fdrpy
« Previous Showing results 1 — 15 out of 342 results