Speech Translation Technology in MASTAR Project

Eiichiro Sumita
2012 Journal of NICT  
of MASTAR project Speech and language processing have advanced dramatically in recent years. This is partly because of establishing the corpus-based technology, i.e., technology of collecting a large volume of data (corpus) and processing the corpus automatically with machine learning algorithms. In conventional approaches, after developing a system, the syst e m i s f u r t h e r m o d i fi e d f o r p r a c t i c a l application. In recent methods, on the other hand, data is collected
more » ... when the system is actually used, and the collected data is used for machine learning. Performance can be improved in the research and development phase through the actual utilization of the system. This is a start of new research and development. The spread of the Web are considered to support further progress. By using the framework of the Web and the information distributed on the Web, one can collect proper nouns from all over the world, develop multilingual MASTAR project, Multi-lingual Advanced Speech and Text Research Project was launched on April 2008 at National Institute of Information and Communications Technology. The project includes research and developments aiming to break language barriers between different language speaking people and barriers between human and machines. While researches of language resources, language translation, and speech communication and information analysis are intensively and collaboratively conducted in the project, this paper concentrates, VoiceTra, the application of speech translation technology.
doi:10.24812/nictjournal.59.3.4_205 fatcat:uesa5y6amfbz7a5hrlazmmyzk4