Analysis and modeling of between-sentence pauses in news speech by Japanese newscasters

Shizuka Nakamura, Carlos Toshinori Ishi, Tatsuya Kawahara
2020 10th International Conference on Speech Prosody 2020   unpublished
Many speech synthesizers hardly consider between-sentence pauses. This could be one of the factors of the monotony of continuous synthesized speech. Aiming at breaking the monotony and improving the news speech likeness, we analyzed the characteristics of between-sentence pause durations of news speech by two newscasters and constructed a model to predict these durations. Analysis of the pause durations firstly revealed that the difference in the distributions between the two newscasters are
more » ... gely affected by pauses after lead sentences, which have a large freedom. Then, from prosodic context analysis, it became clear that the following prosodic features have a correlation with between-sentence pause durations: the mean F0 of the last part in the preceding sentence, and the number of morae included in the subsequent sentence. The correlation coefficient between the predicted values by a linear multiple regression model using these parameters and the measured values was 0.44 for the test data. It was found that between-sentence pause durations could be predicted to some extent by utilizing prosodic information of the preceding and subsequent speech features. The news speech likeness of continuous synthesized speech can be improved by incorporating this model into existing speech synthesizers which generate speech sentence by sentence.
doi:10.21437/speechprosody.2020-139 fatcat:trrwzx7qnzcvdag54g7sqlwxcy