Analysis of Audio Clustering using Word Descriptions

Shiva Sundaram, Shrikanth Narayanan
2007 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07  
We present an analysis of clustering audio clips using word descriptions that are imitative of sounds. These onomatopoeia words describe the acoustic properties of sources, and they can be useful in annotating a medium that cannot embed audio (e.g. text). First, an audio-to-word relationship is established by manually tagging a variety of audio clips (from a sound effects library) with onomatopoeia words. Using a newly proposed distance metric for word-level similarities, the feature vectors
more » ... feature vectors from the audio are clustered according to their tags, resulting in clusters with similarities in their onomatopoeic descriptions. By discriminant analysis of the clusters at the feature level, we present results on separability of these clusters. Our results indicate that by just using onomatopoeic descriptions, meaningful clusters with similar acoustic properties can be formed. However, in terms of audio feature level representation, clusters formed by some word groups such as buzz, fizz etc are better represented by signal features than percussive sounds such as clang, clank, tap. Index Terms-audio ontology, audio information retrieval, analysis of audio clusters, onomatopoeia based audio descriptions % P % R Cluster words B 64.8 81.1
doi:10.1109/icassp.2007.366349 dblp:conf/icassp/SundaramN07a fatcat:ovim4agvlnh63pomhrcijpkkve