A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Proximu: Efficiently Scaling DNN Inference in Multi-core CPUs through Near-Cache Compute
[article]
2020
arXiv
pre-print
Deep Neural Network (DNN) inference is emerging as the fundamental bedrock for a multitude of utilities and services. CPUs continue to scale up their raw compute capabilities for DNN inference along with mature high performance libraries to extract optimal performance. While general purpose CPUs offer unique attractive advantages for DNN inference at both datacenter and edge, they have primarily evolved to optimize single thread performance. For highly parallel, throughput-oriented DNN
arXiv:2011.11695v2
fatcat:pvt7pv6euba6zb4qsm2iydsp7u