A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference
[article]
2021
arXiv
pre-print
Transformer-based language models such as BERT provide significant accuracy improvement for a multitude of natural language processing (NLP) tasks. However, their hefty computational and memory demands make them challenging to deploy to resource-constrained edge platforms with strict latency requirements. We present EdgeBERT, an in-depth algorithm-hardware co-design for latency-aware energy optimization for multi-task NLP. EdgeBERT employs entropy-based early exit predication in order to
arXiv:2011.14203v5
fatcat:mqyhxlsll5dy3esl5sfqgp763q