Accelerating Natural Language Understanding in Task-Oriented Dialog [article]

Ojas Ahuja, Shrey Desai
2020 arXiv   pre-print
Task-oriented dialog models typically leverage complex neural architectures and large-scale, pre-trained Transformers to achieve state-of-the-art performance on popular natural language understanding benchmarks. However, these models frequently have in excess of tens of millions of parameters, making them impossible to deploy on-device where resource-efficiency is a major concern. In this work, we show that a simple convolutional model compressed with structured pruning achieves largely
more » ... le results to BERT on ATIS and Snips, with under 100K parameters. Moreover, we perform acceleration experiments on CPUs, where we observe our multi-task model predicts intents and slots nearly 63x faster than even DistilBERT.
arXiv:2006.03701v1 fatcat:is2dx34gtndhjdrylgy5n233gm