A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Multimodal Punctuation Prediction with Contextual Dropout
[article]
2021
arXiv
pre-print
Finally, we present an approach to learning a model using contextual dropout that allows us to handle variable amounts of future context at test time. ...
We first present a transformer-based approach for punctuation prediction that achieves 8% improvement on the IWSLT 2012 TED Task, beating the previous state of the art [1]. ...
) no dropout, text-only with dropout, and multimodal with dropout. ...
arXiv:2102.11012v1
fatcat:7gwwo7aywfeh5l3w7zwjaro5oy
Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus
[article]
2022
arXiv
pre-print
This paper proposes a unified multimodal punctuation restoration framework, named UniPunc, to punctuate the mixed sentences with a single model. ...
Previous punctuation models, either using text only or demanding the corresponding audio, tend to be constrained by real scenes, where unpunctuated sentences are a mixture of those with and without audio ...
Based on the hybrid representation, the model learns and predicts multimodal punctuation. ...
arXiv:2202.00468v1
fatcat:abruumq7tjfzlfdku3icr7mh7q
Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech
[article]
2020
arXiv
pre-print
In this work, we explore a multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data. ...
As an alternative, we explore attention based multimodal fusion and compare its performance with forced alignment based fusion. ...
Figure 1 : 1 An overview of our multimodal semi-supervised learning architecture for punctuation prediction.
Table 1 : 1 Distribution of punctuation classes in Fisher corpus. ...
arXiv:2008.00702v1
fatcat:mmu44d6r7nb7dbjjywow6ximyq
Multimodal Semi-Supervised Learning Framework for Punctuation Prediction in Conversational Speech
2020
Interspeech 2020
In this work, we explore a multimodal semi-supervised learning approach for punctuation prediction by learning representations from large amounts of unlabelled audio and text data. ...
As an alternative, we explore attention based multimodal fusion and compare its performance with forced alignment based fusion. ...
Figure 1 : 1 An overview of our multimodal semi-supervised learning architecture for punctuation prediction.
Table 1 : 1 Distribution of punctuation classes in Fisher corpus. ...
doi:10.21437/interspeech.2020-3074
dblp:conf/interspeech/SunkaraRBBK20
fatcat:mcnbaynjqvexnldmqolu22egaa
Multimodal Learning for Cardiovascular Risk Prediction using EHR Data
[article]
2020
arXiv
pre-print
To exploit the potential information captured in EHRs, in this study we propose a multimodal recurrent neural network model for cardiovascular risk prediction that integrates both medical texts and structured ...
Various machine learning approaches have been developed to employ information in EHRs for risk prediction. ...
In our multimodal RNN model, dropout and recurrent dropout are used with the BiLSTM layer.
Concatenation Layer. ...
arXiv:2008.11979v1
fatcat:4qgn4jtuxncihboeuca3wtxj7q
TokyoTech_NLP at SemEval-2019 Task 3: Emotion-related Symbols in Emotion Detection
2019
Proceedings of the 13th International Workshop on Semantic Evaluation
This paper presents our contextual emotion detection system in approaching the SemEval-2019 shared task 3: EmoContext: Contextual Emotion Detection in Text. ...
This system cooperates with an emotion detection neural network method ( Poria et al., 2017), emoji2vec (Eisner et al., 2016) embedding, word2vec embedding (Mikolov et al., 2013), and our proposed emoticon ...
Emotion detection as part of sentiment analysis can be conducted with user's multimodal data such as facial expression and voice data in addition to text data. ...
doi:10.18653/v1/s19-2061
dblp:conf/semeval/YangVO19
fatcat:ay2e3ifknrdjnndna7rplucgqe
gundapusunil at SemEval-2020 Task 8: Multimodal Memotion Analysis
[article]
2020
arXiv
pre-print
Internet memes are in the form of images with witty, catchy, or sarcastic text descriptions. ...
Our aim is different than the normal sentiment analysis goal of predicting whether a text expresses positive or negative sentiment; instead, we aim to classify the Internet meme as a positive, negative ...
A dropout of 0.2 was applied to the input of the BiLSTM layer and a dropout of 0.1 was used for the output of BiLSTM layer. ...
arXiv:2010.04470v1
fatcat:xnvsqmu7jncq5lsmxsgtp2tpzm
ELiRF-UPV at SemEval-2019 Task 3: Snapshot Ensemble of Hierarchical Convolutional Neural Networks for Contextual Emotion Detection
2019
Proceedings of the 13th International Workshop on Semantic Evaluation
This paper describes the approach developed by the ELiRF-UPV team at SemEval 2019 Task 3: Contextual Emotion Detection in Text. ...
The proposed ensemble obtains better results than a single model and it obtains competitive and promising results on Contextual Emotion Detection in Text. ...
These contextual systems work on long conversations where different users are involved and they use multimodal data, specifically, text, audio and video in order to address the emotion detection problem ...
doi:10.18653/v1/s19-2031
dblp:conf/semeval/GonzalezHP19
fatcat:buch2zyqcrcznb7zzhaiekve3y
Multimodal Depression Classification Using Articulatory Coordination Features And Hierarchical Attention Based Text Embeddings
[article]
2022
arXiv
pre-print
Multimodal depression classification has gained immense popularity over the recent years. ...
We show that in the case of limited training data, a segment-level classifier can first be trained to then obtain a session-wise prediction without hindering the performance, using a multi-stage convolutional ...
preserve the contextual meaning). ...
arXiv:2202.06238v1
fatcat:23vpiots3bf4fcx2vce36gs6um
InferNER: an attentive model leveraging the sentence-level information for Named Entity Recognition in Microblogs
2021
Proceedings of the ... International Florida Artificial Intelligence Research Society Conference
With Multimodal model, our system also outperforms the current SOTA with an F1 score of 74\% on the multimodal dataset. ...
We also observe the improvement over hard to predict entities such as creative-work and product compared to the current state-of-the-art. Multimodal results. ...
Segregated Contextual Attention. We propose a segregated-contextual-attention-module that considers text and image as separate modalities. ...
doi:10.32473/flairs.v34i1.128538
fatcat:plgdt4nroraahiep2gsbpyse7q
Multimodal Machine Translation with Embedding Prediction
[article]
2019
arXiv
pre-print
Multimodal machine translation is an attractive application of neural machine translation (NMT). It helps computers to deeply understand visual objects and their relations with natural languages. ...
However, multimodal NMT systems suffer from a shortage of available training data, resulting in poor performance for translating rare words. ...
Multimodal Machine Translation with Embedding Prediction We integrate an embedding prediction framework (Kumar and Tsvetkov, 2019) with the multimodal machine translation model and take advantage of ...
arXiv:1904.00639v1
fatcat:hwl3cweulbetnc2dbc7hbz2px4
Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for Multimodal Hate
[article]
2021
arXiv
pre-print
We find that all models perform better on content with full annotator agreement and that multimodal models are best at classifying the content where annotators disagree. ...
We show that both text- and visual- enrichment improves model performance, with the multimodal model (0.771) outperforming other models' F1 scores (0.544, 0.737, and 0.754). ...
Multimodal Lastly, deeper semantic contextualization may be achieved through the inclusion of multimodal data. ...
arXiv:2106.05903v1
fatcat:flvdprgoardofckxv3x6zkvr5m
Multimodal Machine Translation with Embedding Prediction
2019
Proceedings of the 2019 Conference of the North
Multimodal machine translation is an attractive application of neural machine translation (NMT). It helps computers to deeply understand visual objects and their relations with natural languages. ...
However, multimodal NMT systems suffer from a shortage of available training data, resulting in poor performance for translating rare words. ...
Multimodal Machine Translation with Embedding Prediction We integrate an embedding prediction framework (Kumar and Tsvetkov, 2019) with the multimodal machine translation model and take advantage of ...
doi:10.18653/v1/n19-3012
dblp:conf/naacl/HirasawaYMK19
fatcat:vtdvvlnhmndvhcbcqmwirgbqii
Hybrid Attention based Multimodal Network for Spoken Language Classification
2018
Association for Computational Linguistics (ACL). Annual Meeting Conference Proceedings
We present a deep multimodal network with both feature attention and modality attention to classify utterance-level speech data. ...
We examine the utility of linguistic content and vocal characteristics for multimodal deep learning in human spoken language understanding. ...
We removed all punctuation, as spoken language does not provide tokens. ...
pmid:30410219
pmcid:PMC6217979
fatcat:jhg2k65gpnh5bp4s7tpxd6d7wa
Controllable Neural Prosody Synthesis
2020
Interspeech 2020
., misplaced emphases and contextually inappropriate emotions) or generate prosodies with diverse speaker excitement levels and emotions. ...
We address these limitations with a user-controllable, context-aware neural prosody generator. ...
Our weak baseline is the original audio with the original punctuation. ...
doi:10.21437/interspeech.2020-2918
dblp:conf/interspeech/MorrisonJSBM20
fatcat:q4vr3p726fdzrfd6ntkbwy43ii
« Previous
Showing results 1 — 15 out of 264 results