Robust spoken document retrieval methods for misrecognition and out-of-vocabulary keywords

Hiromitsu Nishizaki, Seiichi Nakagawa
2004 Systems and Computers in Japan  
This paper describes a Japanese spoken document retrieval (SDR) system that is robust for Out-of-Vocabulary (OOV) words. In a standard SDR approach that is to automatically transcribe spoken documents into word sequences, the documents including OOV words can not be retrieved. To avoid this problem, we propose a novel method of SDR considering OOV keywords. The method is to use both word based indexing for in-vocabulary keywords and syllable based indexing for OOV keywords, and to switch them
more » ... cording to in-vocabulary/OOV keywords in the query. Evaluation results show that the proposed technique is quite effective in robustly retrieving spoken documents. Keywords spoken document retrieval,word spotting,OOV detection processing and hybrid index
doi:10.1002/scj.10697 fatcat:liynm3th55hkpmexm7ovomr7qi