8 Hits in 1.6 sec

MultiModalQA: Complex Question Answering over Text, Tables and Images [article]

Alon Talmor, Ori Yoran, Amnon Catav, Dan Lahav, Yizhong Wang, Akari Asai, Gabriel Ilharco, Hannaneh Hajishirzi, Jonathan Berant
2021 arXiv   pre-print
In this paper, we present MultiModalQA(MMQA): a challenging question answering dataset that requires joint reasoning over text, tables and images.  ...  We create MMQA using a new framework for generating complex multi-modal questions at scale, harvesting tables from Wikipedia, and attaching images and text paragraphs using entities that appear in each  ...  an image associated with them in the Wikipedia dump; (iii) Wikipedia editors do not always link table cells to WikiEntities; (iv) some images are not representative of the WikiEntities they depict.  ... 
arXiv:2104.06039v1 fatcat:akbd3icbjbajnajmdutymvooem

WebQA: Multihop and Multimodal QA [article]

Yingshan Chang, Mridu Narang, Hisami Suzuki, Guihong Cao, Jianfeng Gao, Yonatan Bisk
2022 arXiv   pre-print
This is the behavior we should be expecting from IoT devices and digital assistants. Existing work prefers to assume that a model can either reason about knowledge in images or in text.  ...  Scaling Visual Question Answering (VQA) to the open-domain and multi-hop nature of web searches, requires fundamental advances in visual representation learning, knowledge aggregation, and language generation  ...  MultiModalQA [3] made the first foray into complex questions that require reasoning over snippets, tables and images.  ... 
arXiv:2109.00590v4 fatcat:mqw3jyj3hfasbkhxghwp6vpoce

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding [article]

Revanth Gangi Reddy, Xilin Rui, Manling Li, Xudong Lin, Haoyang Wen, Jaemin Cho, Lifu Huang, Mohit Bansal, Avirup Sil, Shih-Fu Chang, Alexander Schwing, Heng Ji
2022 arXiv   pre-print
text to answer the question.  ...  Recently, there has been an increasing interest in building question answering (QA) models that reason across multiple modalities, such as text and images.  ...  Ernest Davis (NYU) for insightful advice and feedback on our data set and paper. This research is based upon work supported in part by U.S. DARPA AIDA Program No. FA8750-18-2-0014 and U.S.  ... 
arXiv:2112.10728v2 fatcat:dlagtwqy35cwhbj2cwb3ehqqau

FeTaQA: Free-form Table Question Answering

Linyong Nan, Chiachun Hsieh, Ziming Mao, Xi Victoria Lin, Neha Verma, Rui Zhang, Wojciech Kryściński, Nick Schoelkopf, Riley Kong, Xiangru Tang, Mutethia Mutuma, Ben Rosand (+5 others)
2022 Transactions of the Association for Computational Linguistics  
daily basis: Understand the question and table, retrieve, integrate, infer, and conduct text planning and surface realization to generate an answer.  ...  Existing table question answering datasets contain abundant factual questions that primarily evaluate a QA system's comprehension of query and tabular data.  ...  Acknowledgments The authors would like to thank the anonymous reviewers and the Action Editor for their valuable discussions and feedback.  ... 
doi:10.1162/tacl_a_00446 fatcat:iuumocwzzzel5ltsznv5y4u6yq

Multi-Instance Training for Question Answering Across Table and Linked Text [article]

Vishwajeet Kumar, Saneem Chemmengath, Yash Gupta, Jaydeep Sen, Samarth Bharadwaj, Soumen Chakrabarti
2021 arXiv   pre-print
Often, a question is best answered by matching its parts to either table cell contents or unstructured text spans, and extracting answers from either source.  ...  To reduce cognitive burden, training instances usually include just the question and answer, the latter matching multiple table rows and text passages.  ...  MultiModalQA (Talmor et al. 2021 ) is another question answering benchmark which introduces images as a new modality in addition to table and text for complex question answering in the open domain.  ... 
arXiv:2112.07337v1 fatcat:nqbhzovaqvg6pbvlvkwp3tmbti

Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models [article]

Bogdan Kostić, Julian Risch, Timo Möller
2021 arXiv   pre-print
However, some questions cannot be answered by text alone but require information stored in tables.  ...  In this paper, we present an approach for retrieving both texts and tables relevant to a question by jointly encoding texts, tables and questions into a single vector space.  ...  Acknowledgements We would like to thank Jonathan Herzig and Julian Eisenschlos for taking the time to discuss ideas with us and to give early feedback on experiment results.  ... 
arXiv:2108.04049v2 fatcat:5ab7umnm2jcm7lpbngi2hxhdyy

QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension [article]

Anna Rogers, Matt Gardner, Isabelle Augenstein
2021 arXiv   pre-print
Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years.  ...  We further discuss the current classifications of "reasoning types" in question answering and propose a new taxonomy.  ...  For instance, HybridQA [47] and TAT-QA [292] target the information combined from text and tables, and MultiModalQA [250] adds images to that setting.  ... 
arXiv:2107.12708v1 fatcat:sfwmrimlgfg4xkmmca6wspec7i

MIMOQA: Multimodal Input Multimodal Output Question Answering

Hrituraj Singh, Anshul Nasery, Denil Mehta, Aishwarya Agarwal, Jatin Lamba, Balaji Vasan Srinivasan
2021 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies   unpublished
Darryl Hannan, Akshay Jain, and Mohit Bansal. 2020. Manymodalqa: Modality disambiguation and qa over diverse inputs. In AAAI, pages 7879-7886.  ...  Multimodal research has picked up significantly in the space of question answering with the task being extended to visual question answering, charts question answering as well as multimodal input question  ...  Table 3 : Results showing the performance of E&M and MExBERT over the image modality of the multimodal answer as measured against the proxy scores over test set assumption, while using the highest average  ... 
doi:10.18653/v1/2021.naacl-main.418 fatcat:3ktgkwvbkbgkhhnwdfivjcydyi