Design and Evaluation of an Ultra Low-power Human-quality Speech Recognition System

Dennis Pinto, Jose-María Arnau, Antonio González
2020 ACM Transactions on Architecture and Code Optimization (TACO)  
Automatic Speech Recognition (ASR) has experienced a dramatic evolution since pioneer development of Bell Lab's single-digit recognizer more than 50 years ago. Current ASR systems have taken advantage of the tremendous improvements in AI during the past decade by incorporating Deep Neural Networks into the system and pushing their accuracy to levels comparable to that of humans. This article describes and characterizes a representative ASR system with state-of-the-art accuracy and proposes a
more » ... dware platform capable of decoding speech in real-time with a power dissipation close to 1 Watt. The software is based on the so-called hybrid approach with a vocabulary of 200K words and RNN-based language model re-scoring, whereas the hardware consists of a commercially available low-power processor along with two accelerators used for the most compute-intensive tasks. The article shows that high performance can be obtained with very low power, enabling the deployment of these systems in extremely power-constrained environments such as mobile and IoT devices.
doi:10.1145/3425604 fatcat:73qovytiebbddjvjrfce5igvny