A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Automatic metadata generation and video editing based on speech and image recognition for medical education contents
2006
Interspeech 2006
unpublished
This paper reports a metadata generation system as well as an automatic video edit system. The metadata are information described about the other data. In the audio metadata generation system, speech recognition using general language model (LM) and specialized LM is performed to input speech in order to obtain segment (event group) and audio metadata (event information) respectively. In the video edit system, visual metadata obtained by image recognition and audio metadata are combined into
doi:10.21437/interspeech.2006-618
fatcat:mqwlrmv4a5g23it7z5tau2qriq