804 Hits in 8.6 sec

Two-Stage Polyphonic Sound Event Detection Based on Faster R-CNN-LSTM with Multi-Token Connectionist Temporal Classification

Inyoung Park, Hong Kook Kim
2020 Interspeech 2020  
We propose a two-stage sound event detection (SED) model to deal with sound events overlapping in time-frequency.  ...  To efficiently train polyphonic sound, we take only one PR for each sound event from a bounding box regressor associated with the attention-LSTM.  ...  The second stage of the proposed SED consists of two CNNs combined with one LSTM for feature representation, and a softmax layer combined with CTC for event detection and classification.  ... 
doi:10.21437/interspeech.2020-3097 dblp:conf/interspeech/ParkK20 fatcat:gaiwsbqtmna4ncg7dnhyee7eem

Using a Low-Power Spiking Continuous Time Neuron (SCTN) for Sound Signal Processing

Moshe Bensimon, Shlomo Greenberg, Moshe Haiut
2021 Sensors  
This work presents a new approach based on a spiking neural network for sound preprocessing and classification.  ...  We propose a biologically plausible sound classification framework that uses a Spiking Neural Network (SNN) for detecting the embedded frequencies contained within an acoustic signal.  ...  This work presents a new approach based on a spiking neural network for sound preprocessing and classification.  ... 
doi:10.3390/s21041065 pmid:33557214 pmcid:PMC7913968 fatcat:qhpnb47ahbcchluzuofyx5rat4

Singing Voice Detection: A Survey

Ramy Monir, Daniel Kostrzewa, Dariusz Mrozek
2022 Entropy  
It illustrates a comparison between existing methods for singing voice detection, mainly based on the Jamendo and RWC datasets.  ...  This paper presents a survey on the techniques of singing voice detection with a deep focus on state-of-the-art algorithms such as convolutional LSTM and GRU-RNN.  ...  [64] proposed a recurrent neural network model for voice activity detection.  ... 
doi:10.3390/e24010114 pmid:35052140 pmcid:PMC8775013 fatcat:nt3wnmf4e5anxiiinkpvuqxwfq

Biologically inspired emotion recognition from speech

Laura Caponetti, Cosimo Alessandro Buscicchio, Giovanna Castellano
2011 EURASIP Journal on Advances in Signal Processing  
Emotion recognition has become a fundamental task in human-computer interaction systems. In this article, we propose an emotion recognition approach based on biologically inspired methods.  ...  Specifically, emotion classification is performed using a long short-term memory (LSTM) recurrent neural network which is able to recognize long-range dependencies between successive temporal patterns.  ...  In [37] , an approach for continuous emotion recognition based on LSTM network is introduced, where emotion is represented by continuous values on multiple attribute axes, such as valence, activation,  ... 
doi:10.1186/1687-6180-2011-24 fatcat:6s3io7drmrfr5gchwaptjzbldy

Deep Learning Bidirectional LSTM based Detection of Prolongation and Repetition in Stuttered Speech using Weighted MFCC

Sakshi Gupta, Ravi S., Rajesh K., Rajesh Verma
2020 International Journal of Advanced Computer Science and Applications  
e classification models, based on the accuracy of the classification of stuttered events.  ...  The promising recognition accuracy of 97.33%, 98.67%, 97.5%, 97.19%, and 97.67% was achieved for the detection of fluent, prolongation, syllable, word, and phrase repetition, respectively. 346 | P a g  ...  The study evaluates the efficacy of Bi-LSTM model, based on the accuracy of the classification of stuttered events. A.  ... 
doi:10.14569/ijacsa.2020.0110941 fatcat:js6bgnpqirhd7m7s7hev23mqti

Review of anomalous sound event detection approaches

Amirul Sadikin Md Affendi, Marina Yusoff
2019 IAES International Journal of Artificial Intelligence (IJ-AI)  
<p>This paper presents a review of anomalous sound event detection(SED) approaches.  ...  It is found that the state of the art method viable used in SED using features of log-mel energies in convolutional recurrent neural network(CRNN) with long short term memory(LSTM) with a verification  ...  ACKNOWLEDGEMENTS Universiti Teknologi MARA a for the grant of 600-IRMI/PERDANA 5/3 BESTARI (096/2018) as well as Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Malaysia  ... 
doi:10.11591/ijai.v8.i3.pp264-269 fatcat:gvrdtsepkzbcnb5dubjqifhl3a

Exploring convolutional, recurrent, and hybrid deep neural networks for speech and music detection in a large audio dataset

Diego de Benito-Gorron, Alicia Lozano-Diez, Doroteo T. Toledano, Joaquin Gonzalez-Rodriguez
2019 EURASIP Journal on Audio, Speech, and Music Processing  
We propose and compare two approaches. The first one is the training of two different neural networks, one for speech detection and another for music detection.  ...  We would like to highlight the performance of convolutional architectures, specially in combination with an LSTM stage.  ...  On the one hand, a voice activity detection stage allows the system to operate only over the relevant audio segments, namely, those which contains speech.  ... 
doi:10.1186/s13636-019-0152-1 fatcat:wdclqfeyt5aizlawzrhwauxrue

Deep Learning for Audio Signal Processing

Hendrik Purwins, Bo Li, Tuomas Virtanen, Jan Schluter, Shuo-Yiin Chang, Tara N Sainath
2019 IEEE Journal on Selected Topics in Signal Processing  
Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing.  ...  ) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis).  ...  It is typically done with three basic approaches: a) acoustic scene classification, b) acoustic event detection, and c) tagging.  ... 
doi:10.1109/jstsp.2019.2908700 fatcat:oy2qixj2dfe6hns7r7av6fw2wm

Detecting Driver's Distraction using Long-term Recurrent Convolutional Network [article]

Chang Wei Tan, Mahsa Salehi, Geoffrey Mackellar
2020 arXiv   pre-print
In this study we demonstrate a novel Brain Computer Interface (BCI) approach to detect driver distraction events to improve road safety.  ...  We studied different TSC approaches and designed a Long-term Recurrent Convolutional Network (LCRN) model for this task.  ...  Acknowledgements We would like to thank Emotiv for providing the Epoc headset and data for this research and Professor Geoff Webb for his comments and guidance.  ... 
arXiv:2004.11839v1 fatcat:r3iz7eug6fb47hyznhqtxtjqre

Cross modal video representations for weakly supervised active speaker localization [article]

Rahul Sharma, Krishna Somandepalli, Shrikanth Narayanan
2021 arXiv   pre-print
We also demonstrate state-of-the-art performance for the task of voice activity detection in an audio-visual framework, especially when speech is accompanied by noise and music.  ...  We use the learned cross-modal visual representations, and provide weak supervision from movie subtitles acting as a proxy for voice activity, thus requiring no manual annotations.  ...  It comprises two tasks, i) voice activity detection in audio modality, and ii) active speaker localization in the visual modality.  ... 
arXiv:2003.04358v2 fatcat:bpdufkl34zf53mui3atxo76k74

Table of Contents [EDICS]

2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
Zhang 2585 A Two-Stage Transformer-Based Approach for Variable-Length Abstractive Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ...  Johnson 2283 Joining Sound Event Detection and Localization Through Spatial Segregation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ... 
doi:10.1109/taslp.2020.3046150 fatcat:easrxuwl6zdppejsrf4bskxfw4

Filler Word Detection and Classification: A Dataset and Benchmark [article]

Ge Zhu, Juan-Pablo Caceres, Justin Salamon
2022 arXiv   pre-print
A key reason is the absence of a dataset with annotated filler words for model training and evaluation.  ...  In this work, we present a novel speech dataset, PodcastFillers, with 35K annotated filler words and 50K annotations of other sounds that commonly occur in podcasts such as breaths, laughter, and word  ...  Segment-based metrics map the system output and ground truth to a fixed time grid for comparison. Event-based metrics compare the estimated sound events and the ground truth events directly.  ... 
arXiv:2203.15135v2 fatcat:lqd6r3iprraa7pxiqcaa2kiciq

The practice of speech and language processing in China

Jia Jia, Wei Chen, Kai Yu, Xiaodong He, Jun Du, Heung-Yeung Shum
2021 Communications of the ACM  
For sound-event detection, as shown in Figure 1 , a multibeamforming-based approach is proposed: the diversified spatial information for the neural network is extracted using beamforming towards different  ...  highly expressive voice. 12 For front-end text analysis, a cascaded, multitask BERT-LSTM model is adopted.  ... 
doi:10.1145/3481625 fatcat:f3itoui6vnez3ngw7yvf7iunry

A Comprehensive Review: Computational Models for Obstructive Sleep Apnea Detection in Biomedical Applications

E. Smily JeyaJothi, J. Anitha, Shalli Rani, Basant Tiwari, Yuvaraja Teekaraman
2022 BioMed Research International  
classification techniques employed for the detection and classification of OSA.  ...  The traditional diagnostic approach of OSA is the laboratory-based polysomnography (PSG) overnight sleep study, which is a tedious and labor-intensive process that exaggerates the discomfort to the patient  ...  Hence, feature selection plays a significant role in the classification stage. Li et al. [8] employed a two-stage procedure for feature selection.  ... 
doi:10.1155/2022/7242667 pmid:35224099 pmcid:PMC8866013 fatcat:mjcwveq2wvbr5o32liselabvmq

Table of Contents

2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
Han 2047 A Two-Stage Transformer-Based Approach for Variable-Length Abstractive Summarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ...  Yan 1452 Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection . , and L.  ... 
doi:10.1109/taslp.2020.3046148 fatcat:hirdphjf6zeqdjzwnwlwlamtb4
« Previous Showing results 1 — 15 out of 804 results