Filters








16,778 Hits in 3.3 sec

Multimodal Dialogue Response Generation [article]

Qingfeng Sun, Yujing Wang, Can Xu, Kai Zheng, Yaming Yang, Huang Hu, Fei Xu, Jessica Zhang, Xiubo Geng, Daxin Jiang
2022 arXiv   pre-print
To fill in the gaps, we first present a multimodal dialogue generation model, which takes the dialogue history as input, then generates a textual sequence or an image as response.  ...  Yet existing works only focus on exploring the multimodal dialogue models which depend on retrieval-based methods, but neglecting generation methods.  ...  Figure 2 : 2 Figure 2: The overview of our multimodal dialogue response generation model.  ... 
arXiv:2110.08515v2 fatcat:gem47w2jn5dufjmw2yk3u6hap4

Generation and evaluation of user tailored responses in multimodal dialogue

M.A. Walker, S.J. Whittaker, A. Stent, P. Maloor, J. Moore, M. Johnston, G. Vasireddy
2004 Cognitive Science  
We describe a multimodal dialogue system and algorithms for adaptive content selection based on multi-attribute decision theory.  ...  We demonstrate experimentally the improved efficacy of system responses through the use of user models to both tailor the content of system utterances and to manipulate their conciseness.  ...  Tight coupling of multimodal language processing with speech recognition. References In Proceedings of ICSLP Beijing, China.  ... 
doi:10.1207/s15516709cog2805_8 fatcat:43n7kih3pfcmlfnmwtiarw7y54

Generation and evaluation of user tailored responses in multimodal dialogue

M WALKER, S WHITTAKER, A STENT, P MALOOR, J MOORE, M JOHNSTON, G VASIREDDY
2004 Cognitive Science  
We describe a multimodal dialogue system and algorithms for adaptive content selection based on multi-attribute decision theory.  ...  We demonstrate experimentally the improved efficacy of system responses through the use of user models to both tailor the content of system utterances and to manipulate their conciseness.  ...  Tight coupling of multimodal language processing with speech recognition. References In Proceedings of ICSLP Beijing, China.  ... 
doi:10.1016/j.cogsci.2004.06.002 fatcat:ixlrnn4xnbe75gwbu67e2ukrza

A non-hierarchical attention network with modality dropout for textual response generation in multimodal dialogue systems [article]

Rongyi Sun, Borun Chen, Qingyu Zhou, Yinghui Li, YunBo Cao, Hai-Tao Zheng
2021 arXiv   pre-print
To evaluate our proposed model, we conduct comprehensive experiments on a public multimodal dialogue dataset.  ...  Existing text- and image-based multimodal dialogue systems use the traditional Hierarchical Recurrent Encoder-Decoder (HRED) framework, which has an utterance-level encoder to model utterance representation  ...  As shown in CONCLUSION In this paper, we propose a non-hierarchical attention network with modality dropout strategy for textual responses generation in multimodal dialogue systems, which allows for  ... 
arXiv:2110.09702v2 fatcat:cqwizhfypnbbzmkbogt5ve4d6q

Ordinal and Attribute Aware Response Generation in a Multimodal Dialogue System

Hardik Chauhan, Mauajama Firdaus, Asif Ekbal, Pushpak Bhattacharyya
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
Multimodal dialogue systems have opened new frontiers in the traditional goal-oriented dialogue systems.  ...  Our evaluation shows that the proposed model can generate appropriate responses while preserving the position and attribute information.  ...  component of an end-to-end chatbot, by including image generation and retrieval systems for the completion of a multimodal dialogue system.  ... 
doi:10.18653/v1/p19-1540 dblp:conf/acl/ChauhanFEB19 fatcat:nmmscmcp5nfi3lud562n32lffu

Multimodal Dialogue Response Generation

Qingfeng Sun, Yujing Wang, Can Xu, Kai Zheng, Yaming Yang, Huang Hu, Fei Xu, Jessica Zhang, Xiubo Geng, Daxin Jiang
2022 Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)   unpublished
To fill in the gaps, we first present a new task: multimodal dialogue response generation (MDRG)given the dialogue context, one model needs to generate a text or an image as response.  ...  Yet existing works only focus on exploring the multimodal dialogue models which depend on retrieval-based methods, but neglecting generation methods.  ...  Figure 2 : 2 Figure 2: The overview of our multimodal dialogue response generation model.  ... 
doi:10.18653/v1/2022.acl-long.204 fatcat:u6g7dgpbgvdatoo3qh33bc4aym

Improving Context Modelling in Multimodal Dialogue Generation [article]

Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser
2018 arXiv   pre-print
In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system.  ...  Our work is based on the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion domain.  ...  Conclusion and Future Work In this research, we address the novel task of response generation in search-based multimodal dialogue by learning from the recently released Multimodal Dialogue (MMD) dataset  ... 
arXiv:1810.11955v1 fatcat:7eb55m7iejfsbo5qvu4huhhgcq

Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog [article]

Zekang Li, Zongjia Li, Jinchao Zhang, Yang Feng, Cheng Niu, Jie Zhou
2020 arXiv   pre-print
Our method extends the natural language generation pre-trained model to multimodal dialogue generation task.  ...  fluent responses.  ...  Then we will present our multimodal dialogue generation model and its training methods.  ... 
arXiv:2002.00163v1 fatcat:t36c5mri7vhyrc7mj2d7vfymb4

The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service [article]

Nan Zhao, Haoran Li, Youzheng Wu, Xiaodong He, Bowen Zhou
2021 arXiv   pre-print
We present the solutions of top-5 teams participating in the JDDC multimodal dialogue challenge based on this dataset, which provides valuable insights for further researches on the multimodal dialogue  ...  Thus, bridging the gap of the image and text is crucial for the multimodal dialogue task.  ...  Acknowledgements We would like to thank all participants of the JDDC 2021 dialogue challenge for providing multimodal dialogue solutions.  ... 
arXiv:2109.12913v1 fatcat:hvqk3l5gareybkobpkx3iaxghq

Towards Expressive Communication with Internet Memes: A New Multimodal Conversation Dataset and Benchmark [article]

Zhengcong Fei, Zekang Li, Jinchao Zhang, Yang Feng, Jie Zhou
2021 arXiv   pre-print
To facilitate the MOD research, we construct a large-scale open-domain multimodal dialogue dataset incorporating abundant Internet memes into utterances.  ...  Compared to previous dialogue tasks, MOD is much more challenging since it requires the model to understand the multimodal elements as well as the emotions behind them.  ...  Specifically, provided with a multimodal dialogue context, the MOD task aims to generate a vivid response in text-only, meme-only, or mixed information, which can be considered a general paradigm compared  ... 
arXiv:2109.01839v1 fatcat:p2zd2drhcrfcrlmw23nka7jt3m

Improving Context Modelling in Multimodal Dialogue Generation

Shubham Agarwal, Ondřej Dušek, Ioannis Konstas, Verena Rieser
2018 Proceedings of the 11th International Conference on Natural Language Generation  
In this work, we investigate the task of textual response generation in a multimodal task-oriented dialogue system.  ...  Our work is based on the recently released Multimodal Dialogue (MMD) dataset (Saha et al., 2017) in the fashion domain.  ...  Conclusion and Future Work In this research, we address the novel task of response generation in search-based multimodal dialogue by learning from the recently released Multimodal Dialogue (MMD) dataset  ... 
doi:10.18653/v1/w18-6514 dblp:conf/inlg/AgarwalDKR18 fatcat:ijnkyomdujfzri6mzlnyhlmupm

Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation [article]

Feilong Chen, Fandong Meng, Xiuyi Chen, Peng Li, Jie Zhou
2021 arXiv   pre-print
generates a contextually and visually coherent response.  ...  On the basis of visual grounding, the multimodal incremental transformer encodes the multi-turn dialogue history combined with visual scene step by step according to the order of the dialogue and then  ...  understand the dialogue content, thus generating better responses.  ... 
arXiv:2109.08478v1 fatcat:4eyihunr7ngvzfciezw7qxlxkq

Knowledge Grounded Multimodal Dialog Generation in Task-oriented Settings

Deeksha Varshney, Asif Ekbal Anushkha Singh
2021 Pacific Asia Conference on Language, Information and Computation  
We propose the task of knowledge grounded response generation in a multimodal task-oriented dialog setting.  ...  Knowledge-grounded dialogue generation is the process of formulating an informed response based on both the conversation context and external knowledge.  ...  ., 2019) have also been used to generate responses in task-oriented dialogues.  ... 
dblp:conf/paclic/VarshneyS21 fatcat:cnrpyknugzbthnpc64x7tgjx34

TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation [article]

Wubo Li, Wei Zou, Xiangang Li
2019 arXiv   pre-print
Combined TCT with Multimodal Transformer Network (MTN), we evaluate MTN-TCT on the video-grounded dialogue which uses multimodality.  ...  Multimodalities provide promising performance than unimodality in most tasks. However, learning the semantic of the representations from multimodalities efficiently is extremely challenging.  ...  We evaluated MTN-TCT on the AVSD dataset which uses multimodalities to generate question responses. The proposed approach reports new stateof-the-art performance on video-grounded dialogue.  ... 
arXiv:1911.05186v1 fatcat:pwcnuz2qwfeghl2yts6yc7jpuu

Multimodal Conversational AI: A Survey of Datasets and Approaches [article]

Anirudh Sundar, Larry Heck
2022 arXiv   pre-print
Finally, we identify multimodal co-learning as a promising direction for multimodal conversational AI research.  ...  This paper motivates, defines, and mathematically formulates the multimodal conversational research objective.  ...  Multimodal disambiguation and response generation are challenges associated with fusion that determine whether available multimodal inputs are sufficient for a direct response or if follow-up queries are  ... 
arXiv:2205.06907v1 fatcat:u6kehgeeq5aefdlvv5bpbwsvsa
« Previous Showing results 1 — 15 out of 16,778 results