Filters








264 Hits in 3.8 sec

Multimodal Punctuation Prediction with Contextual Dropout [article]

Andrew Silva, Barry-John Theobald, Nicholas Apostoloff
2021 arXiv   pre-print
Finally, we present an approach to learning a model using contextual dropout that allows us to handle variable amounts of future context at test time.  ...  We first present a transformer-based approach for punctuation prediction that achieves 8% improvement on the IWSLT 2012 TED Task, beating the previous state of the art [1].  ...  ) no dropout, text-only with dropout, and multimodal with dropout.  ... 
arXiv:2102.11012v1 fatcat:7gwwo7aywfeh5l3w7zwjaro5oy

Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus [article]

Yaoming Zhu, Liwei Wu, Shanbo Cheng, Mingxuan Wang
2022 arXiv   pre-print
This paper proposes a unified multimodal punctuation restoration framework, named UniPunc, to punctuate the mixed sentences with a single model.  ...  Previous punctuation models, either using text only or demanding the corresponding audio, tend to be constrained by real scenes, where unpunctuated sentences are a mixture of those with and without audio  ...  Based on the hybrid representation, the model learns and predicts multimodal punctuation.  ... 
arXiv:2202.00468v1 fatcat:abruumq7tjfzlfdku3icr7mh7q

Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech [article]

Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff
2020 arXiv   pre-print
In this work, we explore a multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data.  ...  As an alternative, we explore attention based multimodal fusion and compare its performance with forced alignment based fusion.  ...  Figure 1 : 1 An overview of our multimodal semi-supervised learning architecture for punctuation prediction. Table 1 : 1 Distribution of punctuation classes in Fisher corpus.  ... 
arXiv:2008.00702v1 fatcat:mmu44d6r7nb7dbjjywow6ximyq

Multimodal Semi-Supervised Learning Framework for Punctuation Prediction in Conversational Speech

Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff
2020 Interspeech 2020  
In this work, we explore a multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data.  ...  As an alternative, we explore attention based multimodal fusion and compare its performance with forced alignment based fusion.  ...  Figure 1 : 1 An overview of our multimodal semi-supervised learning architecture for punctuation prediction. Table 1 : 1 Distribution of punctuation classes in Fisher corpus.  ... 
doi:10.21437/interspeech.2020-3074 dblp:conf/interspeech/SunkaraRBBK20 fatcat:mcnbaynjqvexnldmqolu22egaa

Multimodal Learning for Cardiovascular Risk Prediction using EHR Data [article]

Ayoub Bagheri, T. Katrien J. Groenhof, Wouter B. Veldhuis, Pim A. de Jong, Folkert W. Asselbergs, Daniel L. Oberski
2020 arXiv   pre-print
To exploit the potential information captured in EHRs, in this study we propose a multimodal recurrent neural network model for cardiovascular risk prediction that integrates both medical texts and structured  ...  Various machine learning approaches have been developed to employ information in EHRs for risk prediction.  ...  In our multimodal RNN model, dropout and recurrent dropout are used with the BiLSTM layer. Concatenation Layer.  ... 
arXiv:2008.11979v1 fatcat:4qgn4jtuxncihboeuca3wtxj7q

TokyoTech_NLP at SemEval-2019 Task 3: Emotion-related Symbols in Emotion Detection

Zhishen Yang, Sam Vijlbrief, Naoaki Okazaki
2019 Proceedings of the 13th International Workshop on Semantic Evaluation  
This paper presents our contextual emotion detection system in approaching the SemEval-2019 shared task 3: EmoContext: Contextual Emotion Detection in Text.  ...  This system cooperates with an emotion detection neural network method ( Poria et al., 2017), emoji2vec (Eisner et al., 2016) embedding, word2vec embedding (Mikolov et al., 2013), and our proposed emoticon  ...  Emotion detection as part of sentiment analysis can be conducted with user's multimodal data such as facial expression and voice data in addition to text data.  ... 
doi:10.18653/v1/s19-2061 dblp:conf/semeval/YangVO19 fatcat:ay2e3ifknrdjnndna7rplucgqe

gundapusunil at SemEval-2020 Task 8: Multimodal Memotion Analysis [article]

Sunil Gundapu, Radhika Mamidi
2020 arXiv   pre-print
Internet memes are in the form of images with witty, catchy, or sarcastic text descriptions.  ...  Our aim is different than the normal sentiment analysis goal of predicting whether a text expresses positive or negative sentiment; instead, we aim to classify the Internet meme as a positive, negative  ...  A dropout of 0.2 was applied to the input of the BiLSTM layer and a dropout of 0.1 was used for the output of BiLSTM layer.  ... 
arXiv:2010.04470v1 fatcat:xnvsqmu7jncq5lsmxsgtp2tpzm

ELiRF-UPV at SemEval-2019 Task 3: Snapshot Ensemble of Hierarchical Convolutional Neural Networks for Contextual Emotion Detection

José-Ángel González, Lluís-F. Hurtado, Ferran Pla
2019 Proceedings of the 13th International Workshop on Semantic Evaluation  
This paper describes the approach developed by the ELiRF-UPV team at SemEval 2019 Task 3: Contextual Emotion Detection in Text.  ...  The proposed ensemble obtains better results than a single model and it obtains competitive and promising results on Contextual Emotion Detection in Text.  ...  These contextual systems work on long conversations where different users are involved and they use multimodal data, specifically, text, audio and video in order to address the emotion detection problem  ... 
doi:10.18653/v1/s19-2031 dblp:conf/semeval/GonzalezHP19 fatcat:buch2zyqcrcznb7zzhaiekve3y

Multimodal Depression Classification Using Articulatory Coordination Features And Hierarchical Attention Based Text Embeddings [article]

Nadee Seneviratne, Carol Espy-Wilson
2022 arXiv   pre-print
Multimodal depression classification has gained immense popularity over the recent years.  ...  We show that in the case of limited training data, a segment-level classifier can first be trained to then obtain a session-wise prediction without hindering the performance, using a multi-stage convolutional  ...  preserve the contextual meaning).  ... 
arXiv:2202.06238v1 fatcat:23vpiots3bf4fcx2vce36gs6um

InferNER: an attentive model leveraging the sentence-level information for Named Entity Recognition in Microblogs

Moemmur Shahzad, Ayesha Amin, Diego Esteves, Axel-Cyrille Ngonga Ngomo
2021 Proceedings of the ... International Florida Artificial Intelligence Research Society Conference  
With Multimodal model, our system also outperforms the current SOTA with an F1 score of 74\% on the multimodal dataset.  ...  We also observe the improvement over hard to predict entities such as creative-work and product compared to the current state-of-the-art. Multimodal results.  ...  Segregated Contextual Attention. We propose a segregated-contextual-attention-module that considers text and image as separate modalities.  ... 
doi:10.32473/flairs.v34i1.128538 fatcat:plgdt4nroraahiep2gsbpyse7q

Multimodal Machine Translation with Embedding Prediction [article]

Tosho Hirasawa and Hayahide Yamagishi and Yukio Matsumura and Mamoru Komachi
2019 arXiv   pre-print
Multimodal machine translation is an attractive application of neural machine translation (NMT). It helps computers to deeply understand visual objects and their relations with natural languages.  ...  However, multimodal NMT systems suffer from a shortage of available training data, resulting in poor performance for translating rare words.  ...  Multimodal Machine Translation with Embedding Prediction We integrate an embedding prediction framework (Kumar and Tsvetkov, 2019) with the multimodal machine translation model and take advantage of  ... 
arXiv:1904.00639v1 fatcat:hwl3cweulbetnc2dbc7hbz2px4

Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for Multimodal Hate [article]

Austin Botelho and Bertie Vidgen and Scott A. Hale
2021 arXiv   pre-print
We find that all models perform better on content with full annotator agreement and that multimodal models are best at classifying the content where annotators disagree.  ...  We show that both text- and visual- enrichment improves model performance, with the multimodal model (0.771) outperforming other models' F1 scores (0.544, 0.737, and 0.754).  ...  Multimodal Lastly, deeper semantic contextualization may be achieved through the inclusion of multimodal data.  ... 
arXiv:2106.05903v1 fatcat:flvdprgoardofckxv3x6zkvr5m

Multimodal Machine Translation with Embedding Prediction

Tosho Hirasawa, Hayahide Yamagishi, Yukio Matsumura, Mamoru Komachi
2019 Proceedings of the 2019 Conference of the North  
Multimodal machine translation is an attractive application of neural machine translation (NMT). It helps computers to deeply understand visual objects and their relations with natural languages.  ...  However, multimodal NMT systems suffer from a shortage of available training data, resulting in poor performance for translating rare words.  ...  Multimodal Machine Translation with Embedding Prediction We integrate an embedding prediction framework (Kumar and Tsvetkov, 2019) with the multimodal machine translation model and take advantage of  ... 
doi:10.18653/v1/n19-3012 dblp:conf/naacl/HirasawaYMK19 fatcat:vtdvvlnhmndvhcbcqmwirgbqii

Hybrid Attention based Multimodal Network for Spoken Language Classification

Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, Ivan Marsic
2018 Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings  
We present a deep multimodal network with both feature attention and modality attention to classify utterance-level speech data.  ...  We examine the utility of linguistic content and vocal characteristics for multimodal deep learning in human spoken language understanding.  ...  We removed all punctuation, as spoken language does not provide tokens.  ... 
pmid:30410219 pmcid:PMC6217979 fatcat:jhg2k65gpnh5bp4s7tpxd6d7wa

Controllable Neural Prosody Synthesis

Max Morrison, Zeyu Jin, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore
2020 Interspeech 2020  
., misplaced emphases and contextually inappropriate emotions) or generate prosodies with diverse speaker excitement levels and emotions.  ...  We address these limitations with a user-controllable, context-aware neural prosody generator.  ...  Our weak baseline is the original audio with the original punctuation.  ... 
doi:10.21437/interspeech.2020-2918 dblp:conf/interspeech/MorrisonJSBM20 fatcat:q4vr3p726fdzrfd6ntkbwy43ii
« Previous Showing results 1 — 15 out of 264 results