Wake-Up-Word Feature Extraction on FPGA

Veton Z. Këpuska, Mohamed M. Eljhani, Brian H. Hight
2014 World Journal of Engineering and Technology  
Wake-Up-Word Speech Recognition task (WUW-SR) is a computationally very demand, particularly the stage of feature extraction which is decoded with corresponding Hidden Markov Models (HMMs) in the back-end stage of the WUW-SR. The state of the art WUW-SR system is based on three different sets of features: Mel-Frequency Cepstral Coefficients (MFCC), Linear Predictive Coding Coefficients (LPC), and Enhanced Mel-Frequency Cepstral Coefficients (ENH_MFCC). In (front-end of Wake-Up-Word Speech
more » ... p-Word Speech Recognition System Design on FPGA) [1] , we presented an experimental FPGA design and implementation of a novel architecture of a real-time spectrogram extraction processor that generates MFCC, LPC, and ENH_MFCC spectrograms simultaneously. In this paper, the details of converting the three sets of spectrograms 1) Mel-Frequency Cepstral Coefficients (MFCC), 2) Linear Predictive Coding Coefficients (LPC), and 3) Enhanced Mel-Frequency Cepstral Coefficients (ENH_MFCC) to their equivalent features are presented. In the WUW-SR system, the recognizer's front-end is located at the terminal which is typically connected over a data network to remote back-end recognition (e.g., server). The WUW-SR is shown in Figure 1 . The three sets of speech features are extracted at the front-end. These extracted features are then compressed and transmitted to the server via a dedicated channel, where subsequently they are decoded. How to cite this paper: Këpuska, V.Z., Eljhani, M.M. and Hight, B.H. (2014) Wake-Up-Word Feature Extraction on FPGA.
doi:10.4236/wjet.2014.21001 fatcat:j5m3dkkjtrdmbd2cy6uqsdf6fq