A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is
Spoken term detection (STD) provides an efficient means for content based indexing of speech. ... It also performs an analysis of the various approaches in terms of detection accuracy, storage requirements and execution time. ... In Ng and Zue (2000) , phonetic transcripts of a spoken document are obtained using a phone recognizer. ...doi:10.1007/s10772-013-9217-1 fatcat:wdegow7dezc65osihpjhittgou
Spoken Language Understanding
In this chapter we discuss the retrieval and browsing of spoken audio documents. ... We focus primarily on the application of document search where a user provides a query and the system returns a set of audio documents that best match the query. ... At retrieval time, when an OOV query is encountered, the query is converted into a phone sequence and the phone index is used for retrieval. ...doi:10.1002/9781119992691.ch15 fatcat:o36ulm7kh5dxvhm6alb4yz3qvy
The data was taken from "spoken web" material collected over mobile phone connections by IBM India as well as from the LWAZI corpus of African languages. ... As part of the 2011 and 2012 MediaEval benchmark campaigns, a number of diverse systems were implemented by independent teams, and submitted to the "Spoken Web Search" task. ... The "Spoken Web Search" task was originally proposed by researchers from IBM India . ...doi:10.1016/j.csl.2013.12.004 fatcat:yqg35fkzjzhprbitj7si7oqxpu
Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). ... SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and ... . • Efficient. The representation should require a reasonable amount of computation at ASR-time and be very fast at search time. ...doi:10.1561/1500000020 fatcat:o424mjxnp5abbexhjsobtom2ry
The goal of Bayesian analysis is to reduce the uncertainty about unobserved variables by combining prior knowledge with observations. ... In addition to sharing data, the proposed model can learn non-ergodic structures and non-emitting states, something that HDPHMM does not support. ... This process is reminiscent of dynamic time warping (Furui, 1986) and is an integral part of any speech recognition training process. ...doi:10.34944/dspace/2957 fatcat:ce62muy7fffc3mq4t2kqk67mxm