Compression of End-to-End Models

Ruoming Pang, Tara Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang, Chung-Cheng Chiu
2018 Interspeech 2018  
End-to-end models, which directly output text given speech using a single neural network, have been shown to be competitive with conventional speech recognition models containing separate acoustic, pronunciation, and language model components. Such models do not require additional resources for decoding and are typically much smaller than conventional models. This makes them particularly attractive in the context of ondevice speech recognition where both small memory footprint and low power
more » ... umption are critical. This work explores the problem of compressing end-to-end models with the goal of satisfying device constraints without sacrificing model accuracy. We evaluate matrix factorization, knowledge distillation, and parameter sparsity to determine the most effective methods given constraints such as a fixed parameter budget.
doi:10.21437/interspeech.2018-1025 dblp:conf/interspeech/PangSPGWZC18 fatcat:lv3la3qv45ab3mccbfkw27bfim