Document expansion for speech retrieval

Amit Singhal, Fernando Pereira
1999 Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '99  
Advances in automatic speech recognition allow us to search large speech collections using traditional information retrieval methods. The problem of "aboutness" for documents -is a document about a certain concept -has been at the core of document indexing for the entire history of IR. This problem is more difficult for speech indexing and retrieval since automatic speech transcriptions often contain mistakes. In this study we show that document expansion can be successfully used to alleviate
more » ... e effects of transcription mistakes on speech retrieval. Using document expansion, the loss of retrieval effectiveness due to automatic transcription errors can be reduced from 15-27% relative to retrieval from human transcriptions to only about 7-13%, even for automatic transcriptions with word error rates as high as 65%. For good automatic transcriptions (25% word error rate), retrieval effectiveness with document expansion is indistinguishable from retrieval from human transcriptions. This makes speech retrieval from automatic transcriptions, even poor ones, competitive with retrieval from perfect transcriptions.
doi:10.1145/312624.312645 dblp:conf/sigir/SinghalP99 fatcat:cba7wsulfrcwzg2fydzmulpsmy