Feature Reduction Method for Speaker Identification Systems Using Particle Swarm Optimization

Ahmed Al-Hmouz, Khaled Daqrouq, Rami Al-Hmouz, JaafaAlghazo
2017 International Journal of Engineering and Technology  
Feature selection (FS) is a process in which the most informative and descriptive characteristics of a signal that will lead to better classification are chosen. The process is utilized in many areas, such as machine learning, pattern recognition and signal processing. FS reduces the dimensionality of a signal and preserves the most informative features for further processing. A speech signal can consist of thousands of features. Feature extraction methods such as Average Framing Linear
more » ... on Coding (AFLPC) using wavelet transform reduce the number of features from thousands to hundreds. However, the vector of features involves some redundancy. In addition, some features are similar and do not give discrimination to classes. Taking such features into consideration in the classification process will not help to identify certain classes; conversely, they will only serve to confuse the classifier and inhibit identification of accurate classes. This paper proposes an FS method that uses evolution optimization techniques to select the most informative features that maximize the classification rates of Bayesian classifiers. The classification rate is also maximized by modeling the features with the proper number of Gaussian distributions. The results of comparative analysis conducted show that the selection based individual speaker model gives the best classification rate performance. Keyword -Feature Selection, Speaker Identification, Bayes Theorem. I. INTRODUCTION Research on automatic speech recognition (ASR) has actively been conductedover the past four decades[1]. ASR is a tool with many potential applications such as automation of operator-assisted services and speech to text systems for hearing-impaired individuals [2] . In speaker recognition systems, the speech signal is represented by several features, which play a major part in system design. Karhunen-Loeve transform (KLT) based features [3], Mel Frequency Cepstral Coefficient (MFCC) [4], Linear Predictive Cepstral Coefficient (LPCC) [5], and wavelet transform-based features [6]-[8] are examples of signal speech features. Various approaches have been proposed to reduce the number of features required for speech recognition. Paliwal[9] reduced the dimensionality of feature vectors in speech recognition systems and tested the technique on four methods. In [10], the Laplacian Eigenmaps Latent Variable Model (LEVLM) used fewer MFCC vectors without affecting the recognition rate, and it exhibited better performance than Principal Component Analysis (PCA). Feature frame selection based on phonetic information has also been investigated to increase classification rate; however, the exact phonemes cannot be easily extracted [11] . Joint factor analysis (JFA) [12]- [14] is commonly used to enhance the performance of text independent speaker verification systems by modeling speaker and session variability. This work has been extended to ivector, which outperforms JFA in terms of complexity and model size [15] . The classification rate increaseswith the number of feature frames available for training and testing [16] . However, the performance does not continue to improve if more features are added and redundancies exist in the features; consequently, some features can be ignored with no effect on recognition performance [17] . Researchers have also focused on selecting valuable features in speech recognition systems. For example, Euclidean distance measure has been used to determine frame rate [18], anentropy-based approach has been utilized in speech signal in-frame selection [19] , andmaximum likelihood-based feature selection has also been investigated [20] . The previous methods select valuable features in speech signals, but they tend to ignore the redundancy of features in feature frames. Further, reports indicate that maximizing feature information does not lead to a better classification rate [21], [22] . Features should contain minimum redundancy within the selected ISSN (Print) : 2319-8613 ISSN (Online) : 0975-4024 Ahmed Al-Hmouz et al.
doi:10.21817/ijet/2017/v9i3/170903045 fatcat:2aar4cfjircexok6wsycjsq47q