Filters








37 Hits in 8.1 sec

Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs [article]

Ana Marasović, Chandra Bhagavatula, Jae Sung Park, Ronan Le Bras, Noah A. Smith, Yejin Choi
2020 arXiv   pre-print
We present Rationale^VT Transformer, an integrated model that learns to generate free-text rationales by combining pretrained language models with object recognition, grounded visual semantic frames, and  ...  We present the first study focused on generating natural language rationales across several complex visual reasoning tasks: visual commonsense reasoning, visual-textual entailment, and visual question  ...  Acknowledgments The authors thank Sarah Pratt for her assistance with the grounded situation recognizer, Amandalynne Paullada, members of the AllenNLP team, and anonymous reviewers for helpful feedback  ... 
arXiv:2010.07526v1 fatcat:6vafbtt34rccrfly327tjr4pwe

Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs

Ana Marasović, Chandra Bhagavatula, Jae sung Park, Ronan Le Bras, Noah A. Smith, Yejin Choi
2020 Findings of the Association for Computational Linguistics: EMNLP 2020   unpublished
We present RATIONALE VT TRANSFORMER, an integrated model that learns to generate free-text rationales by combining pretrained language models with object recognition, grounded visual semantic frames, and  ...  We present the first study focused on generating natural language rationales across several complex visual reasoning tasks: visual commonsense reasoning, visual-textual entailment, and visual question  ...  Acknowledgments The authors thank Sarah Pratt for her assistance with the grounded situation recognizer, Amandalynne Paullada, members of the AllenNLP team, and anonymous reviewers for helpful feedback  ... 
doi:10.18653/v1/2020.findings-emnlp.253 fatcat:qwnpmjh7hbbflppfcip5yepu4q

ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning [article]

Swarnadeep Saha, Prateek Yadav, Lisa Bauer, Mohit Bansal
2021 arXiv   pre-print
Discriminative tasks are limiting because they fail to adequately evaluate the model's ability to reason and explain predictions with underlying commonsense knowledge.  ...  A significant 79% of our graphs contain external commonsense nodes with diverse structures and reasoning depths.  ...  Natural language rationales with full-stack vi- sual reasoning: From pixels to semantic frames to commonsense graphs.  ... 
arXiv:2104.07644v3 fatcat:7sgvwwriejepriovs5dkmrtq3i

The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning [article]

Jack Hessel and Jena D. Hwang and Jae Sung Park and Rowan Zellers and Chandra Bhagavatula and Anna Rohrbach and Kate Saenko and Yejin Choi
2022 arXiv   pre-print
We evaluate the capacity of models to: i) retrieve relevant inferences from a large candidate corpus; ii) localize evidence for inferences via bounding boxes, and iii) compare plausible inferences to match  ...  Humans have remarkable capacity to reason abductively and hypothesize about what lies beyond the literal content of an image.  ...  Visual understanding and reasoning that go beyond descriptive content have gained increasing attention, including work on visual and analogical reasoning [3, 21, 43, 77] , scene graphs and semantics  ... 
arXiv:2202.04800v1 fatcat:adba3inkdnebpikoquccjwoiji

Rationale-Inspired Natural Language Explanations with Commonsense [article]

Bodhisattwa Prasad Majumder, Oana-Maria Camburu, Thomas Lukasiewicz, Julian McAuley
2021 arXiv   pre-print
Extractive rationales (i.e., subsets of input features) and natural language explanations (NLEs) are two predominant types of explanations for machine learning models.  ...  While NLEs can be more comprehensive than extractive rationales, machine-generated NLEs have been shown to fall short in terms of commonsense knowledge.  ...  Natural language rationales with Jinkyu Kim, Anna Rohrbach, Trevor Darrell, full-stack visual reasoning: From pixels to se- John F. Canny, and Zeynep Akata. 2018.  ... 
arXiv:2106.13876v2 fatcat:dq5ibj3h6zakdfrg7eal3hfxme

A Roadmap for Big Model [article]

Sha Yuan, Hanyu Zhao, Shuai Zhao, Jiahong Leng, Yangxiao Liang, Xiaozhi Wang, Jifan Yu, Xin Lv, Zhou Shao, Jiaao He, Yankai Lin, Xu Han (+88 others)
2022 arXiv   pre-print
, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research.  ...  With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm.  ...  GRF [169] proposes to incorporate big models with dynamic multi-hop reasoning on multi-relational paths extracted from external commonsense knowledge graphs, which can promote language generation with  ... 
arXiv:2203.14101v4 fatcat:rdikzudoezak5b36cf6hhne5u4

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods

Aditya Mogadala, Marimuthu Kalimuthu, Dietrich Klakow
2021 The Journal of Artificial Intelligence Research  
This success can be partly attributed to the advancements made in the sub-fields of AI such as machine learning, computer vision, and natural language processing.  ...  Furthermore, we also provide some potential future directions in this field of research with an anticipation that this survey stimulates innovative thoughts and ideas to address the existing challenges  ...  We extend our special thanks to Matthew Kuhn and Stephanie Lund for painstakingly proofing the whole manuscript.  ... 
doi:10.1613/jair.1.11688 fatcat:kvfdrg3bwrh35fns4z67adqp6i

MERLOT: Multimodal Neural Script Knowledge Models [article]

Rowan Zellers, Ximing Lu, Jack Hessel, Youngjae Yu, Jae Sung Park, Jize Cao, Ali Farhadi, Yejin Choi
2021 arXiv   pre-print
multimodal reasoning, from the recognition to cognition level.  ...  On Visual Commonsense Reasoning, MERLOT answers questions correctly with 80.6% accuracy, outperforming state-of-the-art models of similar size by over 3%, even those that make heavy use of auxiliary supervised  ...  From recognition to cognition: Visual commonsense reasoning.  ... 
arXiv:2106.02636v3 fatcat:mrj2t3yuanbdzhsujshtky4enq

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods [article]

Aditya Mogadala and Marimuthu Kalimuthu and Dietrich Klakow
2020 arXiv   pre-print
This success can be partly attributed to the advancements made in the sub-fields of AI such as Machine Learning (ML), Computer Vision (CV), and Natural Language Processing (NLP).  ...  Furthermore, we also provide some potential future directions in this field of research with an anticipation that this survey brings in innovative thoughts and ideas to address the existing challenges  ...  We extend our special thanks to Matthew Kuhn and Stephanie Lund for painstakingly proofing the whole manuscript.  ... 
arXiv:1907.09358v2 fatcat:4fyf6kscy5dfbewll3zs7yzsuq

InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining [article]

Junyang Lin, An Yang, Yichang Zhang, Jie Liu, Jingren Zhou, Hongxia Yang
2021 arXiv   pre-print
We pretrain the model with three pretraining tasks, including masked segment modeling (MSM), masked region modeling (MRM) and image-text matching (ITM); and finetune the model on a series of vision-and-language  ...  We pretrain the Chinese InterBERT on our proposed dataset of 3.1M image-text pairs from the mobile Taobao, the largest Chinese e-commerce platform.  ...  Visual Commonsense Reasoning Visual commonsense reasoning (VCR) is a task connected with cognition and requires visual understanding [59] .  ... 
arXiv:2003.13198v4 fatcat:6rp3lxy7fnbmxft5kfm5imuisq

ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning

Swarnadeep Saha, Prateek Yadav, Lisa Bauer, Mohit Bansal
2021 Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing   unpublished
Natural language rationales with full-stack vi- improved interpretability in rule reasoning.  ...  In Pro- sual reasoning: From pixels to semantic frames to ceedings of the 2021 Conference of the North Amer- commonsense graphs.  ... 
doi:10.18653/v1/2021.emnlp-main.609 fatcat:t4qgmvbf6fh67ov3xzqoawifre

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey [article]

Julian Wörmann, Daniel Bogdoll, Etienne Bührle, Han Chen, Evaristus Fuh Chuo, Kostadin Cvejoski, Ludger van Elst, Tobias Gleißner, Philip Gottschall, Stefan Griesche, Christian Hellert, Christian Hesels (+34 others)
2022 arXiv   pre-print
The reasons for this are manifold and range from time and cost constraints to ethical considerations.  ...  This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge.  ...  While knowledge extraction in the context of regulations and norms from legal domain is the focus in Section 6.3, natural language as accompanying explanation to visual stimuli is the goal in visual question  ... 
arXiv:2205.04712v1 fatcat:u2bgxr2ctnfdjcdbruzrtjwot4

e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks [article]

Maxime Kayser, Oana-Maria Camburu, Leonard Salewski, Cornelius Emde, Virginie Do, Zeynep Akata, Thomas Lukasiewicz
2021 arXiv   pre-print
Recently, there has been an increasing number of efforts to introduce models capable of generating natural language explanations (NLEs) for their predictions on vision-language (VL) tasks.  ...  It spans four models and three datasets and both automatic metrics and human evaluation are used to assess model-generated explanations. e-SNLI-VE is currently the largest existing VL dataset with NLEs  ...  Natural language rationales with full-stack visual reasoning: From pixels to semantic frames to commonsense graphs.  ... 
arXiv:2105.03761v2 fatcat:tujds362fjca3akncfhzlrumi4

Survey of explainable machine learning with visual and granular methods beyond quasi-explanations [article]

Boris Kovalerchuk
2020 arXiv   pre-print
This paper surveys visual methods of explainability of Machine Learning (ML) with focus on moving from quasi-explanations that dominate in ML to domain-specific explanation supported by granular visuals  ...  The paper includes results on theoretical limits to preserve n-D distances in lower dimensions, based on the Johnson-Lindenstrauss lemma, point-to-point and point-to-graph GLC approaches, and real-world  ...  to a domain expert; • Give an explanation comprehensible to humans in (i) natural language and in (ii) easy to understand representations; • Give an explanation to humans using domain knowledge not ML  ... 
arXiv:2009.10221v1 fatcat:ir3u3jmqjras3aisylyvlmosue

The AI Index 2021 Annual Report [article]

Daniel Zhang, Saurabh Mishra, Erik Brynjolfsson, John Etchemendy, Deep Ganguli, Barbara Grosz, Terah Lyons, James Manyika, Juan Carlos Niebles, Michael Sellitto, Yoav Shoham, Jack Clark (+1 others)
2021 arXiv   pre-print
The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence.  ...  This year we significantly expanded the amount of data available in the report, worked with a broader set of external organizations to calibrate our data, and deepened our connections with the Stanford  ...  Figure Visual Commonsense Reasoning (VCR) Task The Visual Commonsense Reasoning (VCR) task, first introduced in 2018, asks machines to answer a challenging question about a given image and justify  ... 
arXiv:2103.06312v1 fatcat:52qwvzv7jndxzaagyiro6koyza
« Previous Showing results 1 — 15 out of 37 results