Filters








4,600 Hits in 4.2 sec

IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE [article]

Luxi Xing, Yuqiang Xie, Yue Hu, Wei Peng
2020 arXiv   pre-print
This paper introduces our systems for the first two subtasks of SemEval Task4: Commonsense Validation and Explanation.  ...  To clarify the intention for judgment and inject contrastive information for selection, we propose the input reconstruction strategy with prompt templates.  ...  Acknowledgements We thank the anonymous reviewers for their insightful feedback.  ... 
arXiv:2007.00924v1 fatcat:d2ahxv3yxzcztf7cs3jn7flhgy

Generated Knowledge Prompting for Commonsense Reasoning [article]

Jiacheng Liu, Alisa Liu, Ximing Lu, Sean Welleck, Peter West, Ronan Le Bras, Yejin Choi, Hannaneh Hajishirzi
2022 arXiv   pre-print
Generated knowledge prompting highlights large-scale language models as flexible sources of external knowledge for improving commonsense reasoning. Our code is available at  ...  reasoning tasks, achieving state-of-the-art results on numerical commonsense (NumerSense), general commonsense (CommonsenseQA 2.0), and scientific commonsense (QASC) benchmarks.  ...  We thank Daniel Khashabi, Vered Shwartz, Bhargavi Paranjape, Bill Yuchen Lin, Jonathan Herzig for their help with the experiments and evaluation.  ... 
arXiv:2110.08387v2 fatcat:2qwjrwqesreqdnku7yejvux6i4

On-the-Fly Attention Modulation for Neural Generation [article]

Yue Dong, Chandra Bhagavatula, Ximing Lu, Jena D. Hwang, Antoine Bosselut, Jackie Chi Kit Cheung, Yejin Choi
2021 arXiv   pre-print
Automatic and human evaluation results on three text generation benchmarks demonstrate that attention modulation helps LMs generate text with enhanced fluency, creativity, and commonsense reasoning, in  ...  Our analyses on sentence-level attention patterns in LMs reveal that neural degeneration may be associated with insufficient learning of task-specific characteristics by the attention mechanism.  ...  Pacific (N66001-19-2-4031), the Canada CIFAR AI Chair program, the Natural Sciences and Engineering Research Council of Canada (NSERC), Intel Labs Cognitive Computing Research, and the Allen Institute for  ... 
arXiv:2101.00371v2 fatcat:3zpybnunxfgonfwyb5u5ete6jy

ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning [article]

Swarnadeep Saha, Prateek Yadav, Lisa Bauer, Mohit Bansal
2021 arXiv   pre-print
In this work, we present ExplaGraphs, a new generative and structured commonsense-reasoning task (and an associated dataset) of explanation graph generation for stance prediction.  ...  Recent commonsense-reasoning tasks are typically discriminative in nature, where a model answers a multiple-choice question for a certain context.  ...  Acknowledgements We thank the reviewers as well as Yejin Choi, Peter Clark, Peter Hase, Hyounghun Kim, and Jie Lei for their helpful feedback, and the annotators for their time and effort.  ... 
arXiv:2104.07644v3 fatcat:7sgvwwriejepriovs5dkmrtq3i

Explain Yourself! Leveraging Language Models for Commonsense Reasoning

Nazneen Fatema Rajani, Bryan McCann, Caiming Xiong, Richard Socher
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
We collect human explanations for commonsense reasoning in the form of natural language sequences and highlighted annotations in a new dataset called Common Sense Explanations (CoS-E).  ...  We further study commonsense reasoning in DNNs using both human and auto-generated explanations including transfer to out-of-domain tasks.  ...  Acknowledgements We would like to thank Melvin Gruesbeck for the illustration of CAGE in Figure 1 . We also thank the anonymous reviewers for their feedback.  ... 
doi:10.18653/v1/p19-1487 dblp:conf/acl/RajaniMXS19 fatcat:7ezmngl5vfektlfqwm3ybuf7eu

SemEval-2020 Task 4: Commonsense Validation and Explanation [article]

Cunxiang Wang, Shuailong Liang, Yili Jin, Yilong Wang, Xiaodan Zhu, Yue Zhang
2020 arXiv   pre-print
In this paper, we present SemEval-2020 Task 4, Commonsense Validation and Explanation (ComVE), which includes three subtasks, aiming to evaluate whether a system can distinguish a natural language statement  ...  The dataset used in our task can be found at https://github.com/wangcunxiang/SemEval2020- Task4-Commonsense-Validation-and-Explanation; The leaderboard can be found at https://competitions.codalab.org/  ...  Acknowledgements This work is supported by the National Science Foundation of China (Grant No. 61976180), the Westlake University, and the Bright Dream Joint Institute for Intelligent Robotics.  ... 
arXiv:2007.00236v2 fatcat:rp764ih7krf2rnbtfx6zi4mdpy

Probing Commonsense Explanation in Dialogue Response Generation [article]

Pei Zhou, Pegah Jandaghi, Bill Yuchen Lin, Justin Cho, Jay Pujara, Xiang Ren
2021 arXiv   pre-print
We formalize the problem by framing commonsense as a latent variable in the RG task and using explanations for responses as textual form of commonsense.  ...  of CSR for RG.  ...  Acknowledgments We thank anonymous reviewers for providing insightful feedback along with Brendan Kennedy, Peifeng Wang, and members from INK and JAUNTS lab.  ... 
arXiv:2104.09574v4 fatcat:h4rcyvrizfgeppib77sbt23of4

Aligning AI With Shared Human Values [article]

Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt
2021 arXiv   pre-print
We introduce the ETHICS dataset, a new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.  ...  Funding for the ETHICS dataset was generously provided by the Long-Term Future Fund. This research was also supported by the NSF Frontier Award 1804794.  ...  For the Justice and Deontology task, we use this prompt template and use 32 examples to perform few-shot classification. prompt += "Question: Would most people believe this reasonable or unreasonable to  ... 
arXiv:2008.02275v5 fatcat:dcq5jt2nibgedajzxsibnpf2xq

Chain of Thought Prompting Elicits Reasoning in Large Language Models [article]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou
2022 arXiv   pre-print
Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks.  ...  For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned  ...  Acknowledgements We thank Jacob Devlin, Claire Cui, Andrew Dai, and Ellie Pavlick for providing feedback on the paper.  ... 
arXiv:2201.11903v4 fatcat:agbe4tdczjdyhl3txy2es5c3bm

A Review on Language Models as Knowledge Bases [article]

Badr AlKhamissi, Millicent Li, Asli Celikyilmaz, Mona Diab, Marjan Ghazvininejad
2022 arXiv   pre-print
The resulting LM can be probed for different kinds of knowledge and thus acting as a KB. This has a major advantage over traditional KBs in that this method requires no human supervision.  ...  Acknowledgements Special thanks to Siddharth Verma for many helpful discussions and comments on the paper. Ahmed El-Kholy for the graphic in Figure 1 .  ...  Edit Reasoning Explainability tackles this by taking a gradient-based search to find the appropriate prompt for a specific task.  ... 
arXiv:2204.06031v1 fatcat:nrixk5zcrffkdmhrlwifnga6iu

E-KAR: A Benchmark for Rationalizing Natural Language Analogical Reasoning [article]

Jiangjie Chen, Rui Xu, Ziquan Fu, Wei Shi, Zhongqiao Li, Xinbo Zhang, Changzhi Sun, Lei Li, Yanghua Xiao, Hao Zhou
2022 arXiv   pre-print
Empirical results suggest that this benchmark is very challenging for some state-of-the-art models for both explanation generation and analogical question answering tasks, which invites further research  ...  Holding the belief that models capable of reasoning should be right for the right reasons, we propose a first-of-its-kind Explainable Knowledge-intensive Analogical Reasoning benchmark (E-KAR).  ...  Acknowledgement We thank the anonymous reviewers for their valuable suggestions. We also thank Ruxin Yu for the logo design.  ... 
arXiv:2203.08480v1 fatcat:mjyaywxhzjcr7bqrvame6lovim

Language Models are General-Purpose Interfaces [article]

Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei
2022 arXiv   pre-print
Though there is a big convergence in terms of architecture, most pretrained models are typically still developed for specific tasks or modalities.  ...  A collection of pretrained encoders perceive diverse modalities (such as vision, and language), and they dock with a language model that plays the role of a universal task layer.  ...  We apply a "it is [entailment label] because [explanation] ." prompt for generative finetuning.  ... 
arXiv:2206.06336v1 fatcat:m63fbkoctzhbnfl3vldtb42ikq

A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge [article]

Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, Kenneth Marino, Roozbeh Mottaghi
2022 arXiv   pre-print
The Visual Question Answering (VQA) task aspires to provide a meaningful testbed for the development of AI models that can jointly reason over visual and natural language inputs.  ...  In contrast to the existing knowledge-based VQA datasets, the questions generally cannot be answered by simply querying a knowledge base, and instead require some form of commonsense reasoning about the  ...  Other datasets have considered knowledge-based question answering for a sitcom [14] and by using web queries [9] . Explanation / Reasoning VQA.  ... 
arXiv:2206.01718v1 fatcat:pvjlqj3pzrd3zjr5epwbs4dneu

Automatic Story Generation: Challenges and Attempts [article]

Amal Alabdulkarim, Siyan Li, Xiangyu Peng
2021 arXiv   pre-print
., 2019b) is the a large-scale benchmark for commonsense reasoning about social situations, which provides 38k multiple choice questions.  ...  Schwartz et al. (2017) and Trinh and Le (2018) demonstrate a similar approach to using language models for tasks requiring commonsense, such as the Story Cloze Task and the Winograd Schema Challenge,  ... 
arXiv:2102.12634v1 fatcat:b67pi4zy5fc4dp4edidecwo54a

Do Fine-tuned Commonsense Language Models Really Generalize? [article]

Mayank Kejriwal, Ke Shen
2020 arXiv   pre-print
Recently, transformer-based methods such as RoBERTa and GPT-3 have led to significant experimental advances in natural language processing tasks such as question answering and commonsense reasoning.  ...  Since these are commonsense benchmarks, a model that generalizes on commonsense reasoning should not experience much performance loss across multiple commonsense benchmarks.  ...  to test an AI system's capability to apply abductive reasoning and common sense to form possible explanations for a given set of observations.  ... 
arXiv:2011.09159v1 fatcat:pa4ruffdffaunakztgves7tdoq
« Previous Showing results 1 — 15 out of 4,600 results