Age Estimation with Speech-Age Model for Heterogeneous Speech Datasets

Ryu Takeda, Kazunori Komatani
2021 Interspeech 2021   unpublished
This paper describes an age estimation method from speech signals for heterogeneous datasets. Although previous studies in the speech field evaluate age prediction models with held-out testing data within the same dataset recorded in a consistent setting, such evaluation does not measure real performance. The difficulty of heterogeneous datasets is overfitting caused by the corpus-specific properties: transfer function of the recording environment and distributions of age and speaker. We
more » ... a speech-age model and its integration with sequence neural networks (NNs). The speech-age model represents the ambiguity of age as a probability distribution, which also virtually extends the limited range of age distribution of each corpus. A Bayesian generative model successfully integrates the speechage model and the NNs. We also applied mean normalization technique to cope with the transfer function problem. Experiments showed that our proposed method outperformed the baseline neural classifier for completely open test sets in the age distribution and recording setting.
doi:10.21437/interspeech.2021-861 fatcat:t2zpomjj4jhepedjommhbwqnsi