Filters








184 Hits in 1.3 sec

Generating Natural Adversarial Examples [article]

Zhengli Zhao, Dheeru Dua, Sameer Singh
2018 arXiv   pre-print
Due to their complex nature, it is hard to characterize the ways in which machine learning models can misbehave or be exploited when deployed. Recent work on adversarial examples, i.e. inputs with minor perturbations that result in substantially different model predictions, is helpful in evaluating the robustness of these models by exposing the adversarial scenarios where they fail. However, these malicious perturbations are often unnatural, not semantically meaningful, and not applicable to
more » ... plicated domains such as language. In this paper, we propose a framework to generate natural and legible adversarial examples that lie on the data manifold, by searching in semantic space of dense and continuous data representation, utilizing the recent advances in generative adversarial networks. We present generated adversaries to demonstrate the potential of the proposed approach for black-box classifiers for a wide range of applications such as image classification, textual entailment, and machine translation. We include experiments to show that the generated adversaries are natural, legible to humans, and useful in evaluating and analyzing black-box classifiers.
arXiv:1710.11342v2 fatcat:urv6jnxcgfetpfloq7gjdowdtm

Tricks for Training Sparse Translation Models [article]

Dheeru Dua, Shruti Bhosale, Vedanuj Goswami, James Cross, Mike Lewis, Angela Fan
2021 arXiv   pre-print
Multi-task learning with an unbalanced data distribution skews model learning towards high resource tasks, especially when model capacity is fixed and fully shared across all tasks. Sparse scaling architectures, such as BASELayers, provide flexible mechanisms for different tasks to have a variable number of parameters, which can be useful to counterbalance skewed data distributions. We find that that sparse architectures for multilingual machine translation can perform poorly out of the box,
more » ... propose two straightforward techniques to mitigate this - a temperature heating mechanism and dense pre-training. Overall, these methods improve performance on two multilingual translation benchmarks compared to standard BASELayers and Dense scaling baselines, and in combination, more than 2x model convergence speed.
arXiv:2110.08246v1 fatcat:cbtnsy5g7fcjve3jnb2jjvb5au

ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine Reading Comprehension [article]

Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Sameer Singh, Matt Gardner
2019 arXiv   pre-print
., 2016; Dua et al., 2019; Lin et al., 2019) .  ...  DROP (Dua et al., 2019) attempts to force models to have a more comprehensive understanding of a paragraph, by constructing questions that query many parts of the paragraph at the same time.  ... 
arXiv:1912.12598v1 fatcat:qmdmyoj73zhfldcllcybxqr3da

Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq [article]

Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasovic, Zhen Nie
2020 arXiv   pre-print
Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. 2019. DROP: A reading comprehension benchmark requir- ing discrete reasoning over paragraphs.  ...  Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi, Gabriel Ilharco, Daniel  ... 
arXiv:2010.06694v1 fatcat:5jnkjtuz4vehzea7dmntepwnja

Generative Context Pair Selection for Multi-hop Question Answering [article]

Dheeru Dua, Cicero Nogueira dos Santos, Patrick Ng, Ben Athiwaratkun, Bing Xiang, Matt Gardner, Sameer Singh
2021 arXiv   pre-print
., 2019) , label bias (Dua et al., 2020; Gururangan et al., 2018) , survivorship bias (Min et al., 2019; Jiang and Bansal, 2019) , and ascertainment bias (Jia and Liang, 2017) .  ... 
arXiv:2104.08744v1 fatcat:mrm6ucfjerhsvek4a4x5ih77qa

DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs [article]

Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, Matt Gardner
2019 arXiv   pre-print
Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these systems, showing that there is much work left to be done. We introduce a new English reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. In this crowdsourced, adversarially-created, 96k-question benchmark, a system must resolve references in a
more » ... tion, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets. We apply state-of-the-art methods from both the reading comprehension and semantic parsing literature on this dataset and show that the best systems only achieve 32.7% F1 on our generalized accuracy metric, while expert human performance is 96.0%. We additionally present a new model that combines reading comprehension methods with simple numerical reasoning to achieve 47.0% F1.
arXiv:1903.00161v2 fatcat:w2wkwoynzrcsrefawjnys4fzzm

PoMo: Generating Entity-Specific Post-Modifiers in Context [article]

Jun Seok Kang, Robert L. Logan IV, Zewei Chu, Yang Chen, Dheeru Dua, Kevin Gimpel, Sameer Singh, Niranjan Balasubramanian
2019 arXiv   pre-print
We introduce entity post-modifier generation as an instance of a collaborative writing task. Given a sentence about a target entity, the task is to automatically generate a post-modifier phrase that provides contextually relevant information about the entity. For example, for the sentence, "Barack Obama, _______, supported the #MeToo movement.", the phrase "a father of two girls" is a contextually relevant post-modifier. To this end, we build PoMo, a post-modifier dataset created automatically
more » ... rom news articles reflecting a journalistic need for incorporating entity information that is relevant to a particular news event. PoMo consists of more than 231K sentences with post-modifiers and associated facts extracted from Wikidata for around 57K unique entities. We use crowdsourcing to show that modeling contextual relevance is necessary for accurate post-modifier generation. We adapt a number of existing generation approaches as baselines for this dataset. Our results show there is large room for improvement in terms of both identifying relevant facts to include (knowing which claims are relevant gives a >20% improvement in BLEU score), and generating appropriate post-modifier text for the context (providing relevant claims is not sufficient for accurate generation). We conduct an error analysis that suggests promising directions for future research.
arXiv:1904.03111v2 fatcat:lii6lvm6jbg3jgtyyodsoozomq

Evaluating Models' Local Decision Boundaries via Contrast Sets [article]

Matt Gardner, Yoav Artzi, Victoria Basmova, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hanna Hajishirzi (+14 others)
2020 arXiv   pre-print
., Dua et al., 2019; Dasigi et al., 2019) or to generate adversarial inputs (e.g., Zellers et al., 2018 Zellers et al., , 2019 Wallace et al., 2019b; Nie et al., 2019) .  ...  Discrete Reasoning Over Paragraphs (DROP) A reading comprehension dataset that requires numerical reasoning, e.g., adding, sorting, and counting numbers in paragraphs (Dua et al., 2019) .  ... 
arXiv:2004.02709v2 fatcat:zwreyqnxiveyvpktpwazmczfv4

Learning with Instance Bundles for Reading Comprehension

Dheeru Dua, Pradeep Dasigi, Sameer Singh, Matt Gardner
2021 Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing   unpublished
When training most modern reading comprehension models, all the questions associated with a context are treated as being independent from each other. However, closely related questions and their corresponding answers are not independent, and leveraging these relationships could provide a strong supervision signal to a model. Drawing on ideas from contrastive estimation, we introduce several new supervision losses that compare question-answer scores across multiple related instances.
more » ... , we normalize these scores across various neighborhoods of closely contrasting questions and/or answers, adding a cross entropy loss term in addition to traditional maximum likelihood estimation. Our techniques require bundles of related question-answer pairs, which we either mine from within existing data or create using automated heuristics. We empirically demonstrate the effectiveness of training with instance bundles on two datasets-HotpotQA and ROPES-showing up to 9% absolute gains in accuracy.
doi:10.18653/v1/2021.emnlp-main.584 fatcat:oj4nn4duqbhmdousydezuhcuum

Benefits of Intermediate Annotations in Reading Comprehension

Dheeru Dua, Sameer Singh, Matt Gardner
2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics   unpublished
., 2018) , DROP (Dua et al., 2019) , Quoref , and ROPES (Lin et al., 2019) .  ...  Both the models employ a similar arithmetic block introduced in the baseline model (Dua et al., 2019) on top of contextual representations from BERT (Devlin et al., 2019) .  ... 
doi:10.18653/v1/2020.acl-main.497 fatcat:5hnb2cz6ubfvlbh5jwzwbzngea

Dynamic Sampling Strategies for Multi-Task Reading Comprehension

Ananth Gottumukkala, Dheeru Dua, Sameer Singh, Matt Gardner
2020 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics   unpublished
Experiments Setup The eight reading comprehension tasks are from the ORB benchmark (Dua et al., 2019b) : DROP (Dua et al., 2019a) , DuoRC (Saha et al., 2018) , NarrativeQA (Kočisky et al., 2017) ,  ...  We investigate the importance of this structuring by training a multi-task model on the 8 datasets from ORB (Dua et al., 2019b) , a recent multi-task reading comprehension benchmark.  ... 
doi:10.18653/v1/2020.acl-main.86 fatcat:mcjii6it6fda3gmnqlpiggpz4i

Generative Context Pair Selection for Multi-hop Question Answering

Dheeru Dua, Cicero Nogueira dos Santos, Patrick Ng, Ben Athiwaratkun, Bing Xiang, Matt Gardner, Sameer Singh
2021 Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing   unpublished
., 2019) , label bias (Dua et al., 2020; Gururangan et al., 2018) , survivorship bias (Min et al., 2019b; Jiang and Bansal, 2019) , and ascertainment bias (Jia and Liang, 2017) .  ... 
doi:10.18653/v1/2021.emnlp-main.561 fatcat:ps7gb4vfhrbvvffrh3sknjtf74

Easy, Reproducible and Quality-Controlled Data Collection with CROWDAQ

Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasović, Zhen Nie
2020 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations   unpublished
We have put more use cases into the appendix, including DROP (Dua et al., 2019) , MATRES (Ning et al., 2018) , TORQUE (Ning et al., 2020) , VQA-E (Li et al., 2018) , and two ongoing projects.  ... 
doi:10.18653/v1/2020.emnlp-demos.17 fatcat:hvibmuxfe5d7nkv22bjcebttni

Evaluating Models' Local Decision Boundaries via Contrast Sets

Matt Gardner, Yoav Artzi, Victoria Basmov, Jonathan Berant, Ben Bogin, Sihao Chen, Pradeep Dasigi, Dheeru Dua, Yanai Elazar, Ananth Gottumukkala, Nitish Gupta, Hannaneh Hajishirzi (+14 others)
2020 Findings of the Association for Computational Linguistics: EMNLP 2020   unpublished
Dheeru Dua, Yizhong Wang, Pradeep Dasigi, Gabriel Stanovsky, Sameer Singh, and Matt Gardner. 2019. DROP: A reading comprehension benchmark requir- ing discrete reasoning over paragraphs.  ...  ., Dua et al., 2019; Dasigi et al., 2019) or to generate adversarial inputs (e.g., Zellers et al., 2018 Zellers et al., , 2019 Wallace et al., 2019b; Nie et al., 2019) .  ... 
doi:10.18653/v1/2020.findings-emnlp.117 fatcat:lnvj4ujjozh5pocryw7b233sne

PHÁT HIỆN TẬP PHỔ BIẾN GÂY NHẦM LẪN

Huỳnh Thành Lộc
2018 Tạp chí Khoa học Đại học Đà Lạt  
Việc xác định được các tập phổ biến gây nhầm lẫn giúp cho các nhà phân tích có thêm cơ sở để đưa ra những lời khuyến nghị chính xác hơn.  ...  CÀI ĐẶT THỰC NGHIỆM Nghiên cứu này cài đặt thực nghiệm thuật toán khai thác các CFI và áp dụng trên các tập dữ liệu đã được đánh giá từ UCI (Dheeru & Karra, 2017) .  ...  Các tập phổ biến là cơ sở để các chuyên gia đưa ra những thông tin dự đoán từ dữ liệu dựa trên mối quan hệ giữa các hạng mục xuất hiện trong tập phổ biến đó.  ... 
doi:10.37569/dalatuniversity.8.2.440(2018) fatcat:ypzhg7caizezrmv2pdzckd5pae
« Previous Showing results 1 — 15 out of 184 results