A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference
[article]
2020
arXiv
pre-print
In recent years, there has been a flurry of research in deep neural network pruning and compression. Early approaches prune weights individually. However, it is difficult to take advantage of the resulting unstructured sparsity patterns on modern hardware like GPUs. As a result, pruning strategies which impose sparsity structures in the weights have become more popular. However,these structured pruning approaches typically lead to higher losses in accuracy than unstructured pruning. In this
arXiv:2008.11849v1
fatcat:x4usrp5ocrhifkuicim3nujtlm