A Survey on Model Compression for Natural Language Processing [article]

Canwen Xu, Julian McAuley
2022 arXiv   pre-print
With recent developments in new architectures like Transformer and pretraining techniques, significant progress has been made in applications of natural language processing (NLP). However, the high energy cost and long inference delay of Transformer is preventing NLP from entering broader scenarios including edge and mobile computing. Efficient NLP research aims to comprehensively consider computation, time and carbon emission for the entire life-cycle of NLP, including data preparation, model
more » ... raining and inference. In this survey, we focus on the inference stage and review the current state of model compression for NLP, including the benchmarks, metrics and methodology. We outline the current obstacles and future research directions.
arXiv:2202.07105v1 fatcat:4l5hk76o7fc37k6pttrhngpuoy