A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs
[article]
2020
arXiv
pre-print
We present Rationale^VT Transformer, an integrated model that learns to generate free-text rationales by combining pretrained language models with object recognition, grounded visual semantic frames, and ...
We present the first study focused on generating natural language rationales across several complex visual reasoning tasks: visual commonsense reasoning, visual-textual entailment, and visual question ...
Acknowledgments The authors thank Sarah Pratt for her assistance with the grounded situation recognizer, Amandalynne Paullada, members of the AllenNLP team, and anonymous reviewers for helpful feedback ...
arXiv:2010.07526v1
fatcat:6vafbtt34rccrfly327tjr4pwe
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs
2020
Findings of the Association for Computational Linguistics: EMNLP 2020
unpublished
We present RATIONALE VT TRANSFORMER, an integrated model that learns to generate free-text rationales by combining pretrained language models with object recognition, grounded visual semantic frames, and ...
We present the first study focused on generating natural language rationales across several complex visual reasoning tasks: visual commonsense reasoning, visual-textual entailment, and visual question ...
Acknowledgments The authors thank Sarah Pratt for her assistance with the grounded situation recognizer, Amandalynne Paullada, members of the AllenNLP team, and anonymous reviewers for helpful feedback ...
doi:10.18653/v1/2020.findings-emnlp.253
fatcat:qwnpmjh7hbbflppfcip5yepu4q
ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning
[article]
2021
arXiv
pre-print
Discriminative tasks are limiting because they fail to adequately evaluate the model's ability to reason and explain predictions with underlying commonsense knowledge. ...
A significant 79% of our graphs contain external commonsense nodes with diverse structures and reasoning depths. ...
Natural language rationales with full-stack vi-
sual reasoning: From pixels to semantic frames to
commonsense graphs. ...
arXiv:2104.07644v3
fatcat:7sgvwwriejepriovs5dkmrtq3i
The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning
[article]
2022
arXiv
pre-print
We evaluate the capacity of models to: i) retrieve relevant inferences from a large candidate corpus; ii) localize evidence for inferences via bounding boxes, and iii) compare plausible inferences to match ...
Humans have remarkable capacity to reason abductively and hypothesize about what lies beyond the literal content of an image. ...
Visual understanding and reasoning that go beyond descriptive content have gained increasing attention, including work on visual and analogical reasoning [3, 21, 43, 77] , scene graphs and semantics ...
arXiv:2202.04800v1
fatcat:adba3inkdnebpikoquccjwoiji
Rationale-Inspired Natural Language Explanations with Commonsense
[article]
2021
arXiv
pre-print
Extractive rationales (i.e., subsets of input features) and natural language explanations (NLEs) are two predominant types of explanations for machine learning models. ...
While NLEs can be more comprehensive than extractive rationales, machine-generated NLEs have been shown to fall short in terms of commonsense knowledge. ...
Natural language rationales with
Jinkyu Kim, Anna Rohrbach, Trevor Darrell, full-stack visual reasoning: From pixels to se-
John F. Canny, and Zeynep Akata. 2018. ...
arXiv:2106.13876v2
fatcat:dq5ibj3h6zakdfrg7eal3hfxme
A Roadmap for Big Model
[article]
2022
arXiv
pre-print
, Commonsense Reasoning, Reliability&Security, Governance, Evaluation, Machine Translation, Text Generation, Dialogue and Protein Research. ...
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. ...
GRF [169] proposes to incorporate big models with dynamic multi-hop reasoning on multi-relational paths extracted from external commonsense knowledge graphs, which can promote language generation with ...
arXiv:2203.14101v4
fatcat:rdikzudoezak5b36cf6hhne5u4
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
2021
The Journal of Artificial Intelligence Research
This success can be partly attributed to the advancements made in the sub-fields of AI such as machine learning, computer vision, and natural language processing. ...
Furthermore, we also provide some potential future directions in this field of research with an anticipation that this survey stimulates innovative thoughts and ideas to address the existing challenges ...
We extend our special thanks to Matthew Kuhn and Stephanie Lund for painstakingly proofing the whole manuscript. ...
doi:10.1613/jair.1.11688
fatcat:kvfdrg3bwrh35fns4z67adqp6i
MERLOT: Multimodal Neural Script Knowledge Models
[article]
2021
arXiv
pre-print
multimodal reasoning, from the recognition to cognition level. ...
On Visual Commonsense Reasoning, MERLOT answers questions correctly with 80.6% accuracy, outperforming state-of-the-art models of similar size by over 3%, even those that make heavy use of auxiliary supervised ...
From recognition to cognition:
Visual commonsense reasoning. ...
arXiv:2106.02636v3
fatcat:mrj2t3yuanbdzhsujshtky4enq
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
[article]
2020
arXiv
pre-print
This success can be partly attributed to the advancements made in the sub-fields of AI such as Machine Learning (ML), Computer Vision (CV), and Natural Language Processing (NLP). ...
Furthermore, we also provide some potential future directions in this field of research with an anticipation that this survey brings in innovative thoughts and ideas to address the existing challenges ...
We extend our special thanks to Matthew Kuhn and Stephanie Lund for painstakingly proofing the whole manuscript. ...
arXiv:1907.09358v2
fatcat:4fyf6kscy5dfbewll3zs7yzsuq
InterBERT: Vision-and-Language Interaction for Multi-modal Pretraining
[article]
2021
arXiv
pre-print
We pretrain the model with three pretraining tasks, including masked segment modeling (MSM), masked region modeling (MRM) and image-text matching (ITM); and finetune the model on a series of vision-and-language ...
We pretrain the Chinese InterBERT on our proposed dataset of 3.1M image-text pairs from the mobile Taobao, the largest Chinese e-commerce platform. ...
Visual Commonsense Reasoning Visual commonsense reasoning (VCR) is a task connected with cognition and requires visual understanding [59] . ...
arXiv:2003.13198v4
fatcat:6rp3lxy7fnbmxft5kfm5imuisq
ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning
2021
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
unpublished
Natural language rationales with full-stack vi- improved interpretability in rule reasoning. ...
In Pro-
sual reasoning: From pixels to semantic frames to ceedings of the 2021 Conference of the North Amer-
commonsense graphs. ...
doi:10.18653/v1/2021.emnlp-main.609
fatcat:t4qgmvbf6fh67ov3xzqoawifre
Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey
[article]
2022
arXiv
pre-print
The reasons for this are manifold and range from time and cost constraints to ethical considerations. ...
This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. ...
While knowledge extraction in the context of regulations and norms from legal domain is the focus in Section 6.3, natural language as accompanying explanation to visual stimuli is the goal in visual question ...
arXiv:2205.04712v1
fatcat:u2bgxr2ctnfdjcdbruzrtjwot4
e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks
[article]
2021
arXiv
pre-print
Recently, there has been an increasing number of efforts to introduce models capable of generating natural language explanations (NLEs) for their predictions on vision-language (VL) tasks. ...
It spans four models and three datasets and both automatic metrics and human evaluation are used to assess model-generated explanations. e-SNLI-VE is currently the largest existing VL dataset with NLEs ...
Natural language
rationales with full-stack visual reasoning: From pixels to
semantic frames to commonsense graphs. ...
arXiv:2105.03761v2
fatcat:tujds362fjca3akncfhzlrumi4
Survey of explainable machine learning with visual and granular methods beyond quasi-explanations
[article]
2020
arXiv
pre-print
This paper surveys visual methods of explainability of Machine Learning (ML) with focus on moving from quasi-explanations that dominate in ML to domain-specific explanation supported by granular visuals ...
The paper includes results on theoretical limits to preserve n-D distances in lower dimensions, based on the Johnson-Lindenstrauss lemma, point-to-point and point-to-graph GLC approaches, and real-world ...
to a domain expert; • Give an explanation comprehensible to humans in (i) natural language and in (ii) easy to understand representations; • Give an explanation to humans using domain knowledge not ML ...
arXiv:2009.10221v1
fatcat:ir3u3jmqjras3aisylyvlmosue
The AI Index 2021 Annual Report
[article]
2021
arXiv
pre-print
The AI Index Report tracks, collates, distills, and visualizes data related to artificial intelligence. ...
This year we significantly expanded the amount of data available in the report, worked with a broader set of external organizations to calibrate our data, and deepened our connections with the Stanford ...
Figure
Visual Commonsense Reasoning (VCR) Task The Visual Commonsense Reasoning (VCR) task, first introduced in 2018, asks machines to answer a challenging question about a given image and justify ...
arXiv:2103.06312v1
fatcat:52qwvzv7jndxzaagyiro6koyza
« Previous
Showing results 1 — 15 out of 37 results