Filters








16 Hits in 7.5 sec

Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI2 Reasoning Challenge [article]

Sumithra Bhakthavatsalam, Daniel Khashabi, Tushar Khot, Bhavana Dalvi Mishra, Kyle Richardson, Ashish Sabharwal, Carissa Schoenick, Oyvind Tafjord, Peter Clark
2021 arXiv   pre-print
We present the ARC-DA dataset, a direct-answer ("open response", "freeform") version of the ARC (AI2 Reasoning Challenge) multiple-choice dataset.  ...  The ARC-DA dataset addresses these concerns by converting questions to direct-answer format using a combination of crowdsourcing and expert review.  ...  The TPU machines for conducting experiments were provided by Google.  ... 
arXiv:2102.03315v1 fatcat:ltje5fptcnb4fatod6qxhviotm

A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

Michael Boratko, Harshit Padigela, Divyendra Mikkilineni, Pritish Yuvraj, Rajarshi Das, Andrew McCallum, Maria Chang, Achille Fokoue-Nkoutche, Pavan Kapanipathi, Nicholas Mattei, Ryan Musa, Kartik Talamadupula (+1 others)
2018 Proceedings of the Workshop on Machine Reading for Question Answering  
The recent work of introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into an Easy Set and a Challenge Set.  ...  We propose a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset.  ...  Acknowledgments We would like to thank Salim Roukos for helpful suggestions on the annotation process, and Daniel Khashabi for assistance with SemanticILP.  ... 
doi:10.18653/v1/w18-2607 dblp:conf/acl/BoratkoPMYDMCFK18 fatcat:om56vgxk4zaqjlruvzntwqrr7e

A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset [article]

Michael Boratko, Harshit Padigela, Divyendra Mikkilineni, Pritish Yuvraj, Rajarshi Das, Andrew McCallum, Maria Chang, Achille Fokoue-Nkoutche, Pavan Kapanipathi, Nicholas Mattei, Ryan Musa, Kartik Talamadupula, Michael Witbrock
2019 arXiv   pre-print
The recent work of Clark et al. introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into an Easy Set and a Challenge Set.  ...  We propose a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset.  ...  Acknowledgments We would like to thank Salim Roukos for helpful suggestions on the annotation process, and Daniel Khashabi for assistance with SemanticILP.  ... 
arXiv:1806.00358v2 fatcat:kvchhihijvfplgptl53vcforsq

Comparing Test Sets with Item Response Theory [article]

Clara Vania, Phu Mon Htut, William Huang, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman
2021 arXiv   pre-print
Recent years have seen numerous NLP datasets introduced to evaluate the performance of fine-tuned models on natural language understanding tasks.  ...  We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models, while SNLI, MNLI, and CommitmentBank seem to be saturated for current strong models.  ...  Think You Have Solved Question An- swering? Try ARC, the AI2 Reasoning Challenge. arXiv preprint arXiv:1803.05457.  ... 
arXiv:2106.00840v1 fatcat:holqdsprhbb5fhdzvp3dzltd7m

Benchmarking Machine Reading Comprehension: A Psychological Perspective [article]

Saku Sugawara, Pontus Stenetorp, Akiko Aizawa
2021 arXiv   pre-print
validity by shortcut-proof questions and explanation as a part of the task design.  ...  However, the conventional task design of MRC lacks explainability beyond the model interpretation, i.e., reading comprehension by a model cannot be explained in human terms.  ...  Acknowledgments The authors would like to thank Xanh Ho for helping create the dataset list and the anonymous reviewers for their insightful comments.  ... 
arXiv:2004.01912v2 fatcat:lyypngwm4vbk7igfcjfmhkn5ja

Hybrid Autoregressive Inference for Scalable Multi-hop Explanation Regeneration [article]

Marco Valentino, Mokanarangan Thayaparan, Deborah Ferreira, André Freitas
2021 arXiv   pre-print
Further analyses on semantic drift and multi-hop question answering reveal that the proposed hybridisation boosts the quality of the most challenging explanations, contributing to improved performance  ...  To enable complex multi-hop reasoning at scale, this paper focuses on bi-encoder architectures, investigating the problem of scientific explanation regeneration at the intersection of dense and sparse  ...  Think you have solved similarity search with GPUs. IEEE Transactions on Big question answering? try arc, the ai2 reasoning challenge.  ... 
arXiv:2107.11879v2 fatcat:zd7non3tbvaqpkxgwx5yujjdzm

CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP [article]

Qinyuan Ye, Bill Yuchen Lin, Xiang Ren
2021 arXiv   pre-print
Our analysis reveals that the few-shot learning ability on unseen tasks can be improved via an upstream learning stage using a set of seen tasks.  ...  We also observe that the selection of upstream learning tasks can significantly influence few-shot performance on unseen tasks, asking further analysis on task similarity and transferability.  ...  Think you have solved question an- swering? try arc, the ai2 reasoning challenge. ArXiv, abs/1803.05457. Arman Cohan, Waleed Ammar, Madeleine van Zuylen, and Field Cady. 2019.  ... 
arXiv:2104.08835v2 fatcat:xnhrmmsmyzb4fjo7ealrw2vnka

Low-resource Learning with Knowledge Graphs: A Comprehensive Survey [article]

Jiaoyan Chen and Yuxia Geng and Zhuo Chen and Jeff Z. Pan and Yuan He and Wen Zhang and Ian Horrocks and Huajun Chen
2021 arXiv   pre-print
We eventually discussed some challenges and future directions on aspects such as new learning and reasoning paradigms, and the construction of high quality KGs.  ...  dividing them into different paradigms such as the mapping-based, the data augmentation, the propagation-based and the optimization-based.  ...  Think you have solved direct-answer question answering? Try ARC-DA, the direct-answer AI2 reasoning challenge. arXiv preprint arXiv:2102.03315 (2021).  ... 
arXiv:2112.10006v3 fatcat:wkz6gjx4r5gvlhh673p3rqsmgi

Short-term frequency stability

E.J. Baghdady, R.N. Lincoln, J.A. Mullen, B.D. Nelin
1965 Proceedings of the IEEE  
ACKNOWLEDGMENT The authors are pleased to acknowledge the contributions-direct  ...  -I think my question has already been answered to a certain extent. Dr.  ...  ~nizing I, with respect to a (i.e., set dL/da = 0 and solve for a ) .  ... 
doi:10.1109/proc.1965.4489 fatcat:m3xa6eus7zghzl3xeo5gmhgi44

Comparing Test Sets with Item Response Theory

Clara Vania, Phu Mon Htut, William Huang, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman
2021 Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)   unpublished
Recent years have seen numerous NLP datasets introduced to evaluate the performance of fine-tuned models on natural language understanding tasks.  ...  We find that Quoref, Hel-laSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models, while SNLI, MNLI, and CommitmentBank seem to be saturated for current strong models.  ...  Think You Have Solved Question An- swering? Try ARC, the AI2 Reasoning Challenge. arXiv preprint arXiv:1803.05457.  ... 
doi:10.18653/v1/2021.acl-long.92 fatcat:5t5ia47wnbaeld6ghe732icdiy

Few-shot Learning with Multilingual Language Models [article]

Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer (+9 others)
2021 arXiv   pre-print
directions.  ...  On the FLORES-101 machine translation benchmark, our model outperforms GPT-3 on 171 out of 182 translation directions with 32 training examples, while surpassing the official supervised baseline in 45  ...  Think you have solved question monsense causal reasoning. In Proceedings of the answering?  ... 
arXiv:2112.10668v1 fatcat:ehexgbyr5jfetimihdd66sxdtm

Benchmarking Machine Reading Comprehension: A Psychological Perspective

Saku Sugawara, Pontus Stenetorp, Akiko Aizawa
2021 Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume   unpublished
validity by shortcut-proof questions and explanation as a part of the task design.  ...  However, the conventional task design of MRC lacks explainability beyond the model interpretation, i.e., reading comprehension by a model cannot be explained in human terms.  ...  Acknowledgments The authors would like to thank Xanh Ho for helping create the dataset list and the anonymous reviewers for their insightful comments.  ... 
doi:10.18653/v1/2021.eacl-main.137 fatcat:24miopcks5ewvbrpgqmydnw7j4

KILT: a Benchmark for Knowledge Intensive Language Tasks

Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, Vassilis Plachouras, Tim Rocktäschel (+1 others)
2021 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies   unpublished
Challenging problems such as open-domain question answering, fact checking, slot filling and entity linking require access to large, external knowledge sources.  ...  We find that a shared dense vector index coupled with a seq2seq model is a strong baseline, outperforming more tailor-made approaches for fact checking, open-domain question answering and dialogue, and  ...  Think you have solved question an- swering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019.  ... 
doi:10.18653/v1/2021.naacl-main.200 fatcat:6mkfryzj3jenhmpppjiozjizgu

CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP

Qinyuan Ye, Bill Yuchen Lin, Xiang Ren
2021 Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing   unpublished
Our analysis reveals that the few-shot learning ability on unseen tasks can be improved via an upstream learning stage using a set of seen tasks.  ...  We also observe that the selection of upstream learning tasks can significantly influence few-shot performance on unseen tasks, asking further analysis on task similarity and transferability. 1  ...  Think you have solved question an-swering? try arc, the ai2 reasoning challenge. ArXiv, abs/1803.05457. Arman Cohan, Waleed Ammar, Madeleine van Zuylen, and Field Cady. 2019.  ... 
doi:10.18653/v1/2021.emnlp-main.572 fatcat:oqcntk47tfbl7ezsb2oy4353xe

User experience design and agile development : integration as an on-going achievement in practice

Jennifer Ferreira
2012
Agile development and UX design have roots in different disciplines and practitioners have to reconcile their perspectives on developing software if they are to work together.  ...  The findings from the analysis of accounts of practice from the literature show that integration is achieved with the right tools, techniques and processes that coordinate between the tasks of the developers  ...  evidence to answer a specific question.  ... 
doi:10.21954/ou.ro.0000d3dc fatcat:v6tztj7zj5hqlfuwxtgtqjzceu
« Previous Showing results 1 — 15 out of 16 results