A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Assessment of Self-Attention on Learned Features For Sound Event Localization and Detection
[article]
2021
arXiv
pre-print
Joint sound event localization and detection (SELD) is an emerging audio signal processing task adding spatial dimensions to acoustic scene analysis and sound event detection. A popular approach to modeling SELD jointly is using convolutional recurrent neural network (CRNN) models, where CNNs learn high-level features from multi-channel audio input and the RNNs learn temporal relationships from these high-level features. However, RNNs have some drawbacks, such as a limited capability to model
arXiv:2107.09388v2
fatcat:pnb5b3f6jzfrho6bu7lgydatia