BLSTM-CRF Based End-to-End Prosodic Boundary Prediction with Context Sensitive Embeddings in a Text-to-Speech Front-End

Yibin Zheng, Jianhua Tao, Zhengqi Wen, Ya Li
2018 Interspeech 2018  
In this paper, we propose a language-independent end-to-end architecture for prosodic boundary prediction based on BLSTM-CRF. The proposed architecture has three components, word embedding layer, BLSTM layer and CRF layer. The word embedding layer is employed to learn the task-specific embeddings for prosodic boundary prediction. The BLSTM layer can efficiently use both past and future input features, while the CRF layer can efficiently use sentence level information. We integrate these three
more » ... mponents and learn the whole process endto-end. In addition, we investigate both character-level embeddings and context sensitive embeddings to this model, and employ an attention mechanism for combining alternative wordlevel embeddings. By using an attention mechanism, the model is able to decide how much information to use from each level of embeddings. Objective evaluation results show the proposed BLSTM-CRF architecture achieves the best results on both Mandarin and English datasets, with an absolute improvement of 3.21% and 3.74% in F1 score, respectively, for intonational phrase prediction, compared to previous state-of-the-art method (BLSTM). The subjective evaluation results further indicate the effectiveness of the proposed methods.
doi:10.21437/interspeech.2018-1472 dblp:conf/interspeech/ZhengTWL18 fatcat:r4v4bhjawnbabnxtunbpmj7bcm