Filters








1,381 Hits in 5.7 sec

Depression Assessment by Fusing High and Low Level Features from Audio, Video, and Text

Anastasia Pampouchidou, Kostas Marias, Fan Yang, Manolis Tsiknakis, Olympia Simantiraki, Amir Fazlollahi, Matthew Pediaditis, Dimitris Manousos, Alexandros Roniotis, Georgios Giannakakis, Fabrice Meriaudeau, Panagiotis Simos
2016 Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge - AVEC '16  
In our approach, both high and low level features were assessed in each modality. Audio features were extracted from the low-level descriptors provided by the challenge organizers.  ...  Proposed approaches outperforming the reference classification accuracy include the one utilizing statistical descriptors of low-level audio features.  ...  Visual Features Although visual features were limited by the unavailability of raw video recordings, several meaningful (both high and low level) features could be extracted from the set of numbered 2D  ... 
doi:10.1145/2988257.2988266 dblp:conf/mm/PampouchidouSFP16 fatcat:ijcbd7jbozdk7cfs6nu3nfecey

Multi-level Attention network using text, audio and video for Depression Prediction [article]

Anupama Ray, Siddharth Kumar, Rutvik Reddy, Prerana Mukherjee, Ritu Garg
2019 arXiv   pre-print
This paper presents a novel multi-level attention based network for multi-modal depression prediction that fuses features from audio, video and text modalities while learning the intra and inter modality  ...  We perform exhaustive experimentation to create different regression models for audio, video and text modalities.  ...  The network uses several low-level and mid-level features from both audio and video modalities and also sentence embeddings on the speech-to-text output of the participants.  ... 
arXiv:1909.01417v1 fatcat:qkyp2v5kzba7bj7inewz3rt3la

Multi-Modal Adaptive Fusion Transformer Network for the Estimation of Depression Level

Hao Sun, Jiaqing Liu, Shurong Chai, Zhaolin Qiu, Lanfen Lin, Xinyin Huang, Yenwei Chen
2021 Sensors  
For example, it is difficult to extract long-term temporal context information from long sequences of audio and visual data, and it is also difficult to select and fuse useful multi-modal information or  ...  Although numerous machine learning methods have been proposed for estimating the levels of depression via audio, visual, and audiovisual emotion sensing, several challenges still exist.  ...  0.696 Audio/Video/Text Ours Best 0.733 Audio/Video 6.  ... 
doi:10.3390/s21144764 fatcat:2nslvtc7rbg6dabygrvl2h5wn4

Detecting Depression using Vocal, Facial and Semantic Communication Cues

James R. Williamson, Elizabeth Godoy, Miriam Cha, Adrianne Schwarzentruber, Pooya Khorrami, Youngjune Gwon, Hsiang-Tsung Kung, Charlie Dagli, Thomas F. Quatieri
2016 Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge - AVEC '16  
The features and depression classification system were developed for the 6th International Audio/Video Emotion Challenge (AVEC), which provides data consisting of audio, video-based facial action units  ...  PHQ predictions were obtained by fusing outputs from a Gaussian staircase regressor for each feature set, with results on the development set of mean F1=0.81, RMSE=5.31, and MAE=3.34.  ...  Additionally, our best PHQ correlation results were obtained by combining predictions from audio, video and text modalities.  ... 
doi:10.1145/2988257.2988263 dblp:conf/mm/WilliamsonGCSKG16 fatcat:czr572nyz5bg3lh4x2hevwd2pq

Decision Tree Based Depression Classification from Audio Video and Language Information

Le Yang, Dongmei Jiang, Lang He, Ercheng Pei, Meshia Cédric Oveneke, Hichem Sahli
2016 Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge - AVEC '16  
The proposed gender specific decision tree provides a way of fusing the upper level language information with the results obtained using low level audio and visual features.  ...  the development set, with F1 score reaching 0.857 for class depressed and 0.964 for class not depressed.  ...  Finally the speech prosody features and text features are fused to detect depression by a SVM classifier.  ... 
doi:10.1145/2988257.2988269 dblp:conf/mm/YangJHPOS16 fatcat:5w6rl4dbazhmtj3zedkbobautm

Towards Automatic Depression Detection: A BiLSTM/1D CNN-Based Model

Lin Lin, Xuri Chen, Ying Shen, Lin Zhang
2020 Applied Sciences  
In addition, our method utilizes audio and text features simultaneously. Therefore, it can get rid of the misleading information provided by the patients.  ...  In this work, we propose a new automatic depression detection method utilizing speech signals and linguistic content from patient interviews.  ...  Sometimes, text content is also extracted from the audio and videos to improve diagnostic accuracy.  ... 
doi:10.3390/app10238701 fatcat:nwrkxqpgbbf4hdjt3y6inyfyty

Detecting Depression with Audio/Text Sequence Modeling of Interviews

Tuka Al Hanai, Mohammad Ghassemi, James Glass
2018 Interspeech 2018  
We utilized data of 142 individuals undergoing depression screening, and modeled the interactions with audio and text features in a Long-Short Term Memory (LSTM) neural network model to detect depression  ...  Medical professionals diagnose depression by interpreting the responses of individuals to a variety of questions, probing lifestyle changes and ongoing thoughts.  ...  while Pampouchidou et al. and Nasir et al. fused low and high-level features [13, 14] .  ... 
doi:10.21437/interspeech.2018-2522 dblp:conf/interspeech/HanaiGG18 fatcat:czzjyn2ntjewxkzfznplqfunp4

AudVowelConsNet: A phoneme-level based deep CNN architecture for clinical depression diagnosis

Muhammad Muzammel, Hanan Salam, Yann Hoffmann, Mohamed Chetouani, Alice Othmani
2020 Machine Learning with Applications  
In this paper, we propose an Artificial Intelligence (AI) based application for clinical depression recognition and assessment from speech.  ...  On the other hand, research in Machine Learning-based automatic recognition of depression from speech focused on the exploration of various acoustic features for the detection of depression and its severity  ...  Deep learned High-level Features Descriptors The low-level spectral features extracted from the audio vowels and consonants are fed to two different Convolutional Neural Networks (named: Audio Vowels Net  ... 
doi:10.1016/j.mlwa.2020.100005 fatcat:q5f6pdncijbihbll6mvrlqtcqa

Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering

Arman Savran, Houwei Cao, Miraj Shah, Ani Nenkova, Ragini Verma
2012 Proceedings of the 14th ACM international conference on Multimodal interaction - ICMI '12  
We present experiments on fusing facial video, audio and lexical indicators for affect estimation during dyadic conversations.  ...  We use temporal statistics of texture descriptors extracted from facial video, a combination of various acoustic features, and lexical features to create regression based affect estimators for each modality  ...  Acknowledgments This work was supported by Grant Number R01-MH-073174 from the NIH.  ... 
doi:10.1145/2388676.2388781 pmid:25300451 pmcid:PMC4187218 dblp:conf/icmi/SavranCSNV12 fatcat:y73w6hogkbgzdf5zj5fmwa4haq

Staircase Regression in OA RVM, Data Selection and Gender Dependency in AVEC 2016

Zhaocheng Huang, Brian Stasak, Ting Dang, Kalani Wataraka Gamage, Phu Le, Vidhyasaharan Sethu, Julien Epps
2016 Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge - AVEC '16  
For depression classification, we investigate token word selection, vocal tract coordination parameters computed from spectral centroid features, and gender-dependent classification systems.  ...  This submission to the Depression Classification and Continuous Emotion Prediction challenges for AVEC2016 investigates both, with a focus on audio subsystems.  ...  Vocal Tract Coordination Features Motivated by [19] and [20] , we examined the VTC features extracted from four sets of short-term acoustic features and six sets of video features for the depression  ... 
doi:10.1145/2988257.2988265 dblp:conf/mm/HuangSDGLSE16 fatcat:boq2umnkknhtfbaqqtytsetuqi

Advances in Emotion Recognition: Link to Depressive Disorder [chapter]

Xiaotong Cheng, Xiaoxia Wang, Tante Ouyang, Zhengzhi Feng
2020 Mental Disorders [Working Title]  
A great amount of features could be extracted from internal and external emotional signals, by calculating their mean, standard deviation, transformation, wave band power and peak detection, and others  ...  According to Schachter and Singer's peripheral theories of emotion (or cognition-arousal theory), people assess their emotional state by physiological arousal.  ...  proposed new text and video features and hybridizes deep and shallow models for depression estimation and classification from audio, video, and text descriptors.  ... 
doi:10.5772/intechopen.92019 fatcat:jmss4llbpnfrxcue6bzebsgmby

Intelligent system for depression scale estimation with facial expressions and case study in industrial intelligence

Lang He, Chenguang Guo, Prayag Tiwari, Hari Mohan Pandey, Wei Dang
2021 International Journal of Intelligent Systems  
This paper presents an end-to-end trainable intelligent system to generate high-level representations over the entire video clip.  ...  Specifically, a three-dimensional (3D) convolutional neural network equipped with a module spatiotemporal feature aggregation module (STFAM) is trained from scratch on audio/visual emotion challenge (AVEC  ...  Open Access funding enabled and organized by ProjektDEAL. CONFLICT OF INTERESTS The authors declare that there are no conflict of interests.  ... 
doi:10.1002/int.22426 fatcat:3pruu3nbx5hcpjndgasxpentri

A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances [article]

Yan Wang, Wei Song, Wei Tao, Antonio Liotta, Dawei Yang, Xinlei Li, Shuyong Gao, Yixuan Sun, Weifeng Ge, Wei Zhang, Wenqiang Zhang
2022 arXiv   pre-print
However, it is hard to reveal one's inner emotion hidden purposely from facial expressions, audio tones, body gestures, etc.  ...  Thus, the fusion of physical information and physiological signals can provide useful features of emotional states and lead to higher accuracy.  ...  (a) Feature-level fusion for visual-audio emotion recognition adopted from [349] ; (b) Feature-level fusion for text-audio emotion recognition adopted from [350] ; (c) Feature-level fusion for visual-audio-text  ... 
arXiv:2203.06935v3 fatcat:h4t3omkzjvcejn2kpvxns7n2qe

Vision based body gesture meta features for Affective Computing [article]

Indigo J. D. Orton
2020 arXiv   pre-print
In my method I extract pose estimation from videos, detect gestures within body parts, extract meta information from individual gestures, and finally aggregate these features to generate a small feature  ...  This differs to existing work by representing overall behaviour as a small set of aggregated meta features derived from a person's movement.  ...  Acknowledgements First and foremost I thank my supervisor Dr. Marwa Mahmoud, her guidance has been invaluable and her collaboration in the development of the dataset has been integral.  ... 
arXiv:2003.00809v1 fatcat:hijrcgettfgflibbas2tkb7enm

Topic Modeling Based Multi-modal Depression Detection

Yuan Gong, Christian Poellabauer
2017 Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge - AVEC '17  
The 2017 Audio/Visual Emotion Challenge (AVEC) asks participants to build a model to predict depression levels based on the audio, video, and text of an interview ranging between 7-33 minutes.  ...  Since averaging features over the entire interview will lose most temporal information, how to discover, capture, and preserve useful temporal details for such a long interview are significant challenges  ...  In [14] and [13] , the text is analyzed on a subject level and audio/video features are separately extracted and then fused with semantic features, i.e., topic modeling is not used in these approaches  ... 
doi:10.1145/3133944.3133945 dblp:conf/mm/GongP17 fatcat:v5w44egyxvdrzj45ns35fbvgay
« Previous Showing results 1 — 15 out of 1,381 results