Relevance-based quantization of scattering features for unsupervised mining of environmental audio
EURASIP Journal on Audio, Speech, and Music Processing
The emerging field of computational acoustic monitoring aims at retrieving high-level information from acoustic scenes recorded by some network of sensors. These networks gather large amounts of data requiring analysis. To decide which parts to inspect further, we need tools that automatically mine the data, identifying recurring patterns and isolated events. This requires a similarity measure for acoustic scenes that does not impose strong assumptions on the data. The state of the art in audio
... of the art in audio similarity measurement is the "bag-of-frames" approach, which models a recording using summary statistics of short-term audio descriptors, such as mel-frequency cepstral coefficients (MFCCs). They successfully characterise static scenes with little variability in auditory content, but cannot accurately capture scenes with a few salient events superimposed over static background. To overcome this issue, we propose a two-scale representation which describes a recording using clusters of scattering coefficients. The scattering coefficients capture short-scale structure, while the cluster model captures longer time scales, allowing for more accurate characterization of sparse events. Evaluation within the acoustic scene similarity framework demonstrates the interest of the proposed approach. Keywords unsupervised learning · data mining · acoustic signal processing · wavelet transforms · audio databases · content-based retrieval · nearest neighbor searches · acoustic sensors · environmental sensors.