A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade
2021
Findings of the Association for Computational Linguistics: EMNLP 2021
unpublished
Dynamic early exiting aims to accelerate the inference of pre-trained language models (PLMs) by emitting predictions in internal layers without passing through the entire model. In this paper, we empirically analyze the working mechanism of dynamic early exiting and find that it faces a performance bottleneck under high speed-up ratios. On one hand, the PLMs' representations in shallow layers lack high-level semantic information and thus are not sufficient for accurate predictions. On the other
doi:10.18653/v1/2021.findings-emnlp.43
fatcat:67vepzribjhutg6dys7vvfgjeq