Online speech detection and dual-gender speech recognition for captioning broadcast news

Toru Imai, Shoei Sato, Akio Kobayashi, Kazuo Onoe, Shinichi Homma
2006 Interspeech 2006   unpublished
This paper describes two new methods, online speech detection and dual-gender speech recognition, for captioning broadcast news. The proposed online speech detection performs dualgender phoneme recognition and detects a start-point and an end-point based on the ratio between the cumulative phoneme likelihood and the cumulative non-speech likelihood with a very small delay from the audio input. As soon as the start-point is detected, the subsequent continuous speech recognizer with paralleled
more » ... der-dependent acoustic models starts a search using gender change information from the preceding phoneme recognizer to reduce computational cost. Speech recognition experiments on conversational commentaries and field reporting from Japanese broadcast news showed that the proposed speech detection method was effective in reducing false segmentations and also recognition errors in comparison with a conventional method using adaptive energy thresholds. The proposed dualgender speech recognition with the new speech detection significantly reduced the word error rate by 11.2% relative to a conventional gender-independent system, while keeping the computational cost in real-time.
doi:10.21437/interspeech.2006-448 fatcat:ycsiuw5gfjacbmwult43utpu34