Filters








2 Hits in 3.4 sec

CNN-LTE: A class of 1-X pooling convolutional neural networks on label tree embeddings for audio scene classification

Huy Phan, Philipp Koch, Lars Hertel, Marco Maass, Radoslaw Mazur, Alfred Mertins
2017 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
A class of simple 1-X (i.e. 1-max, 1-mean, and 1-mix) pooling convolutional neural networks, which are tailored for the task at hand, are finally learned on top of the image features for scene recognition  ...  Firstly, given the label set of the scenes, a label tree is automatically constructed where the labels are grouped into meta-classes.  ...  Afterward, we train different 1-X pooling convolutional neural networks (CNN), including 1-max, 1-mean, and 1-mix pooling CNNs, on top of these images for classification.  ... 
doi:10.1109/icassp.2017.7952133 dblp:conf/icassp/PhanKHMMM17 fatcat:42wa2ygnfrgqnl2qg6dxwg6mkq

CNN-LTE: a Class of 1-X Pooling Convolutional Neural Networks on Label Tree Embeddings for Audio Scene Recognition [article]

Huy Phan, Lars Hertel, Marco Maass, Philipp Koch, Alfred Mertins
2016 arXiv   pre-print
Different convolutional neural networks, which are tailored for the task at hand, are finally learned on top of the image features for scene recognition.  ...  This category taxonomy is then used in the feature extraction step in which an audio scene instance is represented by a label tree embedding image.  ...  Afterward, we trained different 1-X pooling convolutional neural networks (CNN) [9] , including 1-max, 1-mean, and 1-mix pooling CNNs, on top of these images for recognition.  ... 
arXiv:1607.02303v2 fatcat:u7xgv6sn7vaubart6rmewpnyeq