Robust sound event detection in binaural computational auditory scene analysis
[article]
Ivo Trowitzsch, Technische Universität Berlin, Technische Universität Berlin, Klaus Obermayer
2020
Automatic sound event detection and computational auditory scene analysis gain importance through the increasing prevalence of technical systems operating autonomously or in the background, since such operation requires awareness of the system's environment. In realistic scenes, reliable sound event detection, despite the big improvements of the related automatic speech recognition, still poses a difficult problem: general sounds often are less definable than speech and exhibit less
more »
... and rules; commonly, many sounds occur simultaneously and in all kinds of acoustic environments. Binaural robotic systems are particularly interesting due to their resemblance of human means, but they are also more limited through the restriction to two microphones, specifically regarding spatial acoustic scene analysis. Spatial hearing figures prominently in humans, but for automatic sound event detection so far has gone mostly unregarded. One of the core objectives running through the entire thesis is the development of fundamental systematic methodology with respect to (a) the building of robust sound event detection models, and (b) the elaborate analysis regarding their application in many different situations --- both is underrepresented in available research publications. In the hereinafter presented studies, sound event detection models are built in different training schemes and evaluated in detail with respect to their performance in various acoustic scene conditions. Analyses are conducted on scenes with one to four co-occurring sound events, with sound-to-sound energy ratios of -20 dB to +20 dB, with different spatial source distributions, and in diverse acoustic environments from anechoic to church aula. It is shown (i) to which extent models that have been trained under specific acoustic conditions specialize to these, and (ii) that even with simple algorithms like logistic regression, through acoustically multifarious training almost optimal performances as achieved by the specialized models can be obtained [...]
doi:10.14279/depositonce-9857
fatcat:cgdb2p36r5fyzjba7pkpbkmoji