Minimum variance distortionless response spectral estimation

M. Wolfel, J. McDonough
2005 IEEE Signal Processing Magazine  
S pectral analysis is a fundamental part of many speech processing algorithms, including compression, coding, voice conversion, and feature extraction for automatic recognition. These applications present a variety of requirements: spectral resolution, variance of the estimated spectra, and capacity to model the frequency response function of the vocal tract during voiced speech. To satisfy these requirements, a broad variety of solutions has been proposed in the literature, all of which can be
more » ... classified as either parametric methods, those using a small number of parameters estimated from the data [e.g., linear prediction (LP)], or nonparametric methods, those based on periodograms (e.g., the power spectrum). In this article, we will concentrate on spectral estimation techniques that are useful in extracting the features to be used by an automatic speech recognition (ASR) system. As an aid to understanding the spectral estimation process for speech signals, we adopt the source filter model of speech production [1], wherein speech is divided into two broad classes: voiced and unvoiced. Voiced speech is quasi-periodic, consisting of a fundamental frequency corresponding to the pitch of a speaker, as well as its harmonics. Unvoiced speech is stochastic in nature and is best modeled as white noise convolved with an infinite impulse response filter. The extraction of cepstral features for ASR is traditionally based on either Mel-scaled frequency coefficients, LP [2], or perceptual LP [3] . Though widely used, the basis of each of these feature extraction schemes [namely, the Fourier transformation (FT) or LP] is ill-suited for the reliable estimation of the spectra of speech signals,
doi:10.1109/msp.2005.1511829 fatcat:zsdqdo5ierbujcwx573etqifs4