Histogram of gradients of Time-Frequency Representations for Audio Scene Detection

Alain Rakotomamonjy, Gilles Gasso
2014 IEEE/ACM Transactions on Audio Speech and Language Processing  
This paper addresses the problem of audio scenes classification and contributes to the state of the art by proposing a novel feature. We build this feature by considering histogram of gradients (HOG) of an audio scene time-frequency representation. Contrarily to classical audio features like MFCC, we make the hypothesis that histograms of gradients are able to encode some relevant informations in a time-frequency representation: namely, the local direction of variation (in time and frequency)
more » ... the signal spectral power. In addition, in order to gain more invariance and robustness, histograms of gradients are locally pooled. We have evaluated the relevance of the novel feature by comparing its performances with stateof-the-art competitors, on several datasets, including a novel one that we provide, as part of our contribution. This dataset, that we make publicly available, involves 19 classes and contains about 1500 minutes of audio scene recordings. We thus believe that it may be the next standard dataset for evaluating audio scene classification algorithms. Our comparison results clearly show that the HOG-based features outperform its competitors.
doi:10.1109/taslp.2014.2375575 fatcat:zru7ljznkjelfi7h7biaen4xve