Development of Multi-lingual Spoken Corpora of Indian Languages [chapter]

K. Samudravijaya
2006 Lecture Notes in Computer Science  
This paper describes a recently initiated effort for collection and transcription of read as well as spontaneous speech data in four Indian languages. The completed preparatory work include the design of phonetically rich sentences, data acquisition setup for recording speech data over telephone channel, a Wizard of Oz setup for acquiring speech data of a spoken dialogue of a caller with the machine in the context of a remote information retrieval task. An account of care taken to collect
more » ... data that is as close to real world as possible is given. The current status of the programme and the set of actions planned to achieve the goal is given.
doi:10.1007/11939993_79 fatcat:4bhyoimb7nd7lcxord75373jkq