Filters








574 Hits in 5.2 sec

Deep neural networks for automatic detection of screams and shouted speech in subway trains

Pierre Laffitte, David Sodoyer, Charles Tatkeu, Laurent Girin
2016 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Deep Neural Networks (DNNs) have recently become a popular technique for regression and classification problems.  ...  Their capacity to learn high-order correlations between input and output data proves to be very powerful for automatic speech recognition.  ...  For example, smart house concepts are currently being developed, involving automatic systems for domestic events detection using audio and video data streams [7, 8] .  ... 
doi:10.1109/icassp.2016.7472921 dblp:conf/icassp/LaffitteSTG16 fatcat:do5qburxvncbbfddhwvlhl4pde

Connecting Web Event Forecasting with Anomaly Detection: A Case Study on Enterprise Web Applications Using Self-Supervised Neural Networks [article]

Xiaoyong Yuan, Lei Ding, Malek Ben Salem, Xiaolin Li, Dapeng Wu
2020 arXiv   pre-print
data, and sequence embedding techniques to integrate contextual events and capture dependencies among web events.  ...  DeepEvent provides a context-based system for researchers and practitioners to better forecast web events with situational awareness.  ...  Deep Neural Networks for Log Data Analysis Deep neural networks have been used to analyze log data.  ... 
arXiv:2008.13707v2 fatcat:dkfqyqgxonethnrrhaojqfz2qi

Learning behavioral context recognition with multi-stream temporal convolutional networks [article]

Aaqib Saeed, Tanir Ozcelebi, Stojan Trajanovski, Johan Lukkien
2018 arXiv   pre-print
on highly imbalanced and sparsely labeled dataset.  ...  Our empirical evaluation suggests that a deep convolutional network trained end-to-end achieves an optimal recognition rate.  ...  a specific sound event.  ... 
arXiv:1808.08766v1 fatcat:bruvasyaonhudidp4czg5t4gf4

Multimodal Depression Severity Prediction from medical bio-markers using Machine Learning Tools and Technologies [article]

Shivani Shimpi, Shyam Thombre, Snehal Reddy, Ritik Sharma, Srijan Singh
2020 arXiv   pre-print
when prompted with a question.  ...  The given approach attempts to detect, emphasize, and classify the features of a depressed person based on the low-level descriptors for verbal and visual features, and context of the language features  ...  We also thank the anonymous reviewers for their helpful comments.  ... 
arXiv:2009.05651v2 fatcat:sjrf66l3qrb7hh5o63nvrbnc7e

Filler Word Detection and Classification: A Dataset and Benchmark [article]

Ge Zhu, Juan-Pablo Caceres, Justin Salamon
2022 arXiv   pre-print
A key reason is the absence of a dataset with annotated filler words for training and evaluation.  ...  We make PodcastFillers publicly available, and hope our work serves as a benchmark for future research.  ...  We adopt a deep neural network (DNN) architecture that has been shown to be robust for VAD in complex environments with noise [15] .  ... 
arXiv:2203.15135v1 fatcat:hwdloroumbhjbadmcer42dzgsu

Survey on deep learning with class imbalance

Justin M. Johnson, Taghi M. Khoshgoftaar
2019 Journal of Big Data  
deep learning techniques for addressing class imbalanced data.  ...  Effective classification with imbalanced data is an important area of research, as high class imbalance is naturally inherent in many real-world applications, e.g., fraud detection and cancer detection  ...  Atlantic University, for assistance with the reviews.  ... 
doi:10.1186/s40537-019-0192-5 fatcat:dor65fgn7ffhxmqqv3mkold6wq

Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions [article]

Anil Rahate, Rahee Walambe, Sheela Ramanna, Ketan Kotecha
2021 arXiv   pre-print
Multimodal deep learning systems which employ multiple modalities like text, image, audio, video, etc., are showing better performance in comparison with individual modalities (i.e., unimodal) systems.  ...  Our final goal is to discuss challenges and perspectives along with the important ideas and directions for future work that we hope to be beneficial for the entire research community focusing on this exciting  ...  It improved accuracy for audio-visual event localization even in the presence of noisy inputs.  ... 
arXiv:2107.13782v2 fatcat:s4spofwxjndb7leqbcqnwbifq4

Design and Development of AD-CGAN: Conditional Generative Adversarial Networks for Anomaly Detection

Okwudili M. Ezeme, Qusay H. Mahmoud, Akramul Azim
2020 IEEE Access  
a novel framework for anomaly detection.  ...  During testing, we do not use the single class CGAN, thereby providing us with a lean and efficient algorithm for anomaly detection that can do anomaly detection on semisupervised and non-parametric multivariate  ...  Recently, GANs [7] has emerged as an essential area of deep learning with groundbreaking results in mostly image, audio, and video applications.  ... 
doi:10.1109/access.2020.3025530 fatcat:clby2hrxpbggbhleb655srvqau

A Survey of Data Representation for Multi-Modality Event Detection and Evolution

Kejing Xiao, Zhaopeng Qian, Biao Qin
2022 Applied Sciences  
The goal of multi-modality event detection is to discover events from a huge amount of online data with different data structures, such as texts, images and videos.  ...  Next, we discuss the techniques of data representation for event detection, including textual, visual, and multi-modality content. Finally, we review event evolution under multi-modality data.  ...  [12] su event detection in audio and video. However, they discuss event detection for d modality separately, rather than combining the various modalities.  ... 
doi:10.3390/app12042204 fatcat:5gpezz6yhjejlmdzr5fhpgka6m

HEAR 2021: Holistic Evaluation of Audio Representations [article]

Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto (+11 others)
2022 arXiv   pre-print
What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning?  ...  The aim of the HEAR 2021 NeurIPS challenge is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios.  ...  HEAR 2021 includes two types of tasks: 1) Scene-based: Multi-class or multi-label classification of an entire audio clip; 2) Timestamp-based: Sound event detection/transcription, which involves detecting  ... 
arXiv:2203.03022v2 fatcat:a36xyt2f4fapxpnf6afekabn3m

Automatic Assessment Of Singing Voice Pronunciation: A Case Study With Jingju Music

Rong Gong, Xavier Serra
2018 Zenodo  
This dissertation aims to develop data-driven audio signal processing and machine learning (deep learning) models for automatic singing voice assessment in audio collections of jingju music.  ...  Data-driven computational approaches require well-organized data for model training and testing, and we report the p [...]  ...  Acknowledgements I am grateful to my stuttering -my lifelong companion, who teaches me compassion for the weak, patience, modesty and to never give up.  ... 
doi:10.5281/zenodo.1490343 fatcat:f3mrhstkdff6ppmdadeasfuo7m

Assessing the Performances of different Neural Network Architectures for the Detection of Screams and Shouts in Public Transportation

Pierre Laffitte, Yun Wang, David Sodoyer, Laurent Girin
2018 Expert systems with applications  
The present article proposes an audio-based intelligent system for surveillance in public transportation, investigating the use of some state-of-the-art artificial intelligence methods for the automatic  ...  detection of screams and shouts.  ...  The DCASE challenge attests to the popularity of this task, which has many applications ranging from smart houses involving automatic systems for domestic events detection using audio and video data streams  ... 
doi:10.1016/j.eswa.2018.08.052 fatcat:5w67iau5vbhttl4lq63fk2hyxi

Handling Class Overlap and Imbalance to Detect Prompt Situations in Smart Homes

Barnan Das, Narayanan C. Krishnan, Diane J. Cook
2013 2013 IEEE 13th International Conference on Data Mining Workshops  
Our solution, ClusBUS, is a clustering-based undersampling technique that identifies data regions where minority class samples are embedded deep inside majority class.  ...  We are motivated to address the challenge of class overlap in the presence of imbalanced classes by a problem in pervasive computing.  ...  Our solution ClusBUS, is a clustering-based undersampling technique, that identifies data regions where minority class (prompt) samples are embedded deep inside majority class samples.  ... 
doi:10.1109/icdmw.2013.18 dblp:conf/icdm/DasKC13a fatcat:ika72pq4ifcutcad24ltl2pak4

Accelerometer-Based Human Fall Detection Using Convolutional Neural Networks

Guto Santos, Patricia Endo, Kayo Monteiro, Elisson Rocha, Ivanovitch Silva, Theo Lynn
2019 Sensors  
Unsurprisingly, human fall detection and prevention are a major focus of health research. In this article, we consider deep learning for fall detection in an IoT and fog computing environment.  ...  The best results are achieved when using data augmentation during the training process. The paper concludes with a discussion of challenges and future directions for research in this domain.  ...  Figure 3 illustrates a use case for a IoT-enabled connected healthcare system for detecting human falls using an end device (embedded with an accelerometer sensor), a fog device, and deep learning.  ... 
doi:10.3390/s19071644 fatcat:6pu6ztah3fgh7fxvn4f5t7ad3y

Detecting Deception in Political Debates Using Acoustic and Textual Features [article]

Daniel Kopev, Ahmed Ali, Ivan Koychev, Preslav Nakov
2019 arXiv   pre-print
We further developed a multimodal deep-learning architecture for the task of deception detection, which yielded sizable improvements over the state of the art for the CLEF-2018 Lab task 2.  ...  Starting with such data from the CLEF-2018 CheckThat! Lab, which was limited to text, we performed alignment to the corresponding videos, thus producing a multimodal dataset.  ...  Starting with the corresponding event videos, we used Kaldi 7 along with the Gentle forced aligner tool 8 to align the speech with the text of the claim and to obtain timestamps in the audio of the starting  ... 
arXiv:1910.01990v1 fatcat:tacodpuyazbyzonohqt5574bg4
« Previous Showing results 1 — 15 out of 574 results