Filters








215 Hits in 0.49 sec

Multilingual Modal Sense Classification using a Convolutional Neural Network [article]

Ana Marasović, Anette Frank
<span title="2016-08-18">2016</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Modal sense classification (MSC) is a special WSD task that depends on the meaning of the proposition in the modal's scope. We explore a CNN architecture for classifying modal sense in English and German. We show that CNNs are superior to manually designed feature-based classifiers and a standard NN classifier. We analyze the feature maps learned by the CNN and identify known and previously unattested linguistic features. We benchmark the CNN on a standard WSD task, where it compares favorably to models using sense-disambiguated target vectors.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1608.05243v1">arXiv:1608.05243v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/w7ycaxtobzbdfnlzc766etthwe">fatcat:w7ycaxtobzbdfnlzc766etthwe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200824025020/https://arxiv.org/pdf/1608.05243v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/3e/41/3e419c2ec389ec199e803bf9b7133b1a76a04dfc.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1608.05243v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

A Mention-Ranking Model for Abstract Anaphora Resolution [article]

Ana Marasović, Leo Born, Juri Opitz, Anette Frank
<span title="2017-07-21">2017</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Resolving abstract anaphora is an important, but difficult task for text understanding. Yet, with recent advances in representation learning this task becomes a more tangible aim. A central property of abstract anaphora is that it establishes a relation between the anaphor embedded in the anaphoric sentence and its (typically non-nominal) antecedent. We propose a mention-ranking model that learns how abstract anaphors relate to their antecedents with an LSTM-Siamese Net. We overcome the lack of
more &raquo; ... training data by generating artificial anaphoric sentence--antecedent pairs. Our model outperforms state-of-the-art results on shell noun resolution. We also report first benchmark results on an abstract anaphora subset of the ARRAU corpus. This corpus presents a greater challenge due to a mixture of nominal and pronominal anaphors and a greater range of confounders. We found model variants that outperform the baselines for nominal anaphors, without training on individual anaphor data, but still lag behind for pronominal anaphors. Our model selects syntactically plausible candidates and -- if disregarding syntax -- discriminates candidates using deeper features.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1706.02256v2">arXiv:1706.02256v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/zbu4zwaspzfrrhhfln4eythbjm">fatcat:zbu4zwaspzfrrhhfln4eythbjm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191024092744/https://arxiv.org/pdf/1706.02256v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/87/4a/874af385fc7ed8c5c611aebaa0e50425f13b84f8.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1706.02256v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Explaining NLP Models via Minimal Contrastive Editing (MiCE) [article]

Alexis Ross, Ana Marasović, Matthew E. Peters
<span title="2021-06-23">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
., 2016) , or attention (Wiegreffe and Pinter, 2019; Sun and Marasović, 2021) .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.13985v2">arXiv:2012.13985v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/v2f7ll62a5d7vkkyp4vgrzlmuq">fatcat:v2f7ll62a5d7vkkyp4vgrzlmuq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210626110622/https://arxiv.org/pdf/2012.13985v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/d3/6a/d36a36a6f0c094a56088f268983cda30c04316e4.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2012.13985v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Few-Shot Self-Rationalization with Natural Language Prompts [article]

Ana Marasović, Iz Beltagy, Doug Downey, Matthew E. Peters
<span title="2022-04-26">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
The free-text format is essential for explaining tasks requiring reasoning about unstated knowledge such as commonsense (Marasović et al., 2020) , and it makes explanations more intuitive to people compared  ...  few-shot evaluation (Bragg et al., 2021) Human Evaluation For our final models ( §4), we conduct a human evaluation of plausibility of generated explanations following prior work (Kayser et al., 2021; Marasović  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2111.08284v2">arXiv:2111.08284v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/nkwgkxl2draizccdbyi6s6hxxi">fatcat:nkwgkxl2draizccdbyi6s6hxxi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220430231340/https://arxiv.org/pdf/2111.08284v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/9a/25/9a258f42e333ed5ff79037724eb01747ede0bb49.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2111.08284v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Easy, Reproducible and Quality-Controlled Data Collection with Crowdaq [article]

Qiang Ning, Hao Wu, Pradeep Dasigi, Dheeru Dua, Matt Gardner, Robert L. Logan IV, Ana Marasovic, Zhen Nie
<span title="2020-10-06">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
High-quality and large-scale data are key to success for AI systems. However, large-scale data annotation efforts are often confronted with a set of common challenges: (1) designing a user-friendly annotation interface; (2) training enough annotators efficiently; and (3) reproducibility. To address these problems, we introduce Crowdaq, an open-source platform that standardizes the data collection pipeline with customizable user-interface components, automated annotator qualification, and saved
more &raquo; ... ipelines in a re-usable format. We show that Crowdaq simplifies data annotation significantly on a diverse set of data collection use cases and we hope it will be a convenient tool for the community.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.06694v1">arXiv:2010.06694v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/5jnkjtuz4vehzea7dmntepwnja">fatcat:5jnkjtuz4vehzea7dmntepwnja</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201120194657/https://arxiv.org/pdf/2010.06694v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/a0/c5/a0c52a5cca698636b5a516b24f824d23f506f6e8.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.06694v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Modal Sense Classification At Large

Ana Marasović, Mengfei Zhou, Alexis Palmer, Anette Frank
<span title="2016-08-01">2016</span> <i title="University of Colorado at Boulder"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/fzeym5stzza3xidpbeayotil3e" style="color: black;">Linguistic Issues in Language Technology</a> </i> &nbsp;
Recent work in Marasović and Frank (2016) applied the same method to automatically acquire a large dataset for modal sense classification for German.  ...  Recent work in Marasović and Frank (2016) show that convolutional neural networks are able to improve on manually crafted feature-based approaches and are easily portable to novel languages, while preserving  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.33011/lilt.v14i.1397">doi:10.33011/lilt.v14i.1397</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mnmn3r5fubf7vi5xmrq27riyxe">fatcat:mnmn3r5fubf7vi5xmrq27riyxe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210920042340/https://journals.colorado.edu/index.php/lilt/article/download/1397/1239" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/c8/09/c809f53348f66ae89cb16a018033476ca79cc966.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.33011/lilt.v14i.1397"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="unlock alternate icon" style="background-color: #fb971f;"></i> Publisher / doi.org </button> </a>

Measuring Association Between Labels and Free-Text Rationales [article]

Sarah Wiegreffe, Ana Marasović, Noah A. Smith
<span title="2021-09-10">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We additionally use the term "rationale" to also mean "explanation"; for a more detailed discussion of terminology see Jacovi and Goldberg (2021) ; Wiegreffe and Marasović (2021) .  ...  However, using datasetcollection to explicitly collect sufficient rationales does not address the unnaturalness of such a task formulation (Wiegreffe and Marasović, 2021) . 7 Table 5 indicates that (especially  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.12762v3">arXiv:2010.12762v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/oljeqtqbdfa2he66rbrcca4mjy">fatcat:oljeqtqbdfa2he66rbrcca4mjy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210916061130/https://arxiv.org/pdf/2010.12762v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/fd/05/fd058051185a58be10b031d108a970d8b45e2bfb.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.12762v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization [article]

Shruti Palaskar, Akshita Bhagia, Yonatan Bisk, Florian Metze, Alan W Black, Ana Marasovic
<span title="2022-05-24">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
This is beneficial since text generation improves with model size (Brown et al., 2020) , incl. self-rationalization (Marasović et al., 2022) . 2.  ...  This is in contrast to self-rationalization of textonly inputs where performance monotonically increases with the T5's model size (Marasović et al., 2022) .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2205.11686v1">arXiv:2205.11686v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/x6w6l4ivgvh6big7t4fl573yny">fatcat:x6w6l4ivgvh6big7t4fl573yny</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220526110555/https://arxiv.org/pdf/2205.11686v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/fe/16/fe16c9d223dc09dc8e027d1b441f2845c00363a6.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2205.11686v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs [article]

Ana Marasović, Chandra Bhagavatula, Jae Sung Park, Ronan Le Bras, Noah A. Smith, Yejin Choi
<span title="2020-10-15">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Natural language rationales could provide intuitive, higher-level explanations that are easily understandable by humans, complementing the more broadly studied lower-level explanations based on gradients or attention weights. We present the first study focused on generating natural language rationales across several complex visual reasoning tasks: visual commonsense reasoning, visual-textual entailment, and visual question answering. The key challenge of accurate rationalization is
more &raquo; ... image understanding at all levels: not just their explicit content at the pixel level, but their contextual contents at the semantic and pragmatic levels. We present Rationale^VT Transformer, an integrated model that learns to generate free-text rationales by combining pretrained language models with object recognition, grounded visual semantic frames, and visual commonsense graphs. Our experiments show that the base pretrained language model benefits from visual adaptation and that free-text rationalization is a promising research direction to complement model interpretability for complex visual-textual reasoning tasks.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.07526v1">arXiv:2010.07526v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6vafbtt34rccrfly327tjr4pwe">fatcat:6vafbtt34rccrfly327tjr4pwe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201024010101/https://arxiv.org/pdf/2010.07526v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/c9/94/c9940a17504a3b83bd1e9d613b095ddb204d2ad0.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2010.07526v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

A Mention-Ranking Model for Abstract Anaphora Resolution

Ana Marasovic, Leo Born, Juri Opitz, Anette Frank
<span title="">2017</span> <i title="Association for Computational Linguistics"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/u3ideoxy4fghvbsstiknuweth4" style="color: black;">Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</a> </i> &nbsp;
Resolving abstract anaphora is an important, but difficult task for text understanding. Yet, with recent advances in representation learning this task becomes a more tangible aim. A central property of abstract anaphora is that it establishes a relation between the anaphor embedded in the anaphoric sentence and its (typically non-nominal) antecedent. We propose a mention-ranking model that learns how abstract anaphors relate to their antecedents with an LSTM-Siamese Net. We overcome the lack of
more &raquo; ... training data by generating artificial anaphoric sentenceantecedent pairs. Our model outperforms state-of-the-art results on shell noun resolution. We also report first benchmark results on an abstract anaphora subset of the ARRAU corpus. This corpus presents a greater challenge due to a mixture of nominal and pronominal anaphors and a greater range of confounders. We found model variants that outperform the baselines for nominal anaphors, without training on individual anaphor data, but still lag behind for pronominal anaphors. Our model selects syntactically plausible candidates and -if disregarding syntax -discriminates candidates using deeper features.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/d17-1021">doi:10.18653/v1/d17-1021</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/emnlp/MarasovicBOF17.html">dblp:conf/emnlp/MarasovicBOF17</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/tsi2bejulvcjbhx2yua7zjemvu">fatcat:tsi2bejulvcjbhx2yua7zjemvu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200309064454/https://www.aclweb.org/anthology/D17-1021.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/d5/45/d5450a186b1cc8969af1bf3986c2ce63f28163d9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/d17-1021"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>

Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning [article]

Pradeep Dasigi, Nelson F. Liu, Ana Marasović, Noah A. Smith, Matt Gardner
<span title="2019-09-05">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Machine comprehension of texts longer than a single sentence often requires coreference resolution. However, most current reading comprehension benchmarks do not contain complex coreferential phenomena and hence fail to evaluate the ability of models to resolve coreference. We present a new crowdsourced dataset containing more than 24K span-selection questions that require resolving coreference among entities in over 4.7K English paragraphs from Wikipedia. Obtaining questions focused on such
more &raquo; ... nomena is challenging, because it is hard to avoid lexical cues that shortcut complex reasoning. We deal with this issue by using a strong baseline model as an adversary in the crowdsourcing loop, which helps crowdworkers avoid writing questions with exploitable surface cues. We show that state-of-the-art reading comprehension models perform significantly worse than humans on this benchmark---the best model performance is 70.5 F1, while the estimated human performance is 93.4 F1.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1908.05803v2">arXiv:1908.05803v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7cb25dknwjb67pbwmhbarwryzq">fatcat:7cb25dknwjb67pbwmhbarwryzq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200907081424/https://arxiv.org/pdf/1908.05803v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/78/99/789989f21c59e1fca9e68985174d711788f46c4a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1908.05803v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus [article]

Jesse Dodge, Maarten Sap, Ana Marasović, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Margaret Mitchell, Matt Gardner
<span title="2021-09-30">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Large language models have led to remarkable progress on many NLP tasks, and researchers are turning to ever-larger text corpora to train them. Some of the largest corpora available are made by scraping significant portions of the internet, and are frequently introduced with only minimal documentation. In this work we provide some of the first documentation for the Colossal Clean Crawled Corpus (C4; Raffel et al., 2020), a dataset created by applying a set of filters to a single snapshot of
more &raquo; ... on Crawl. We begin by investigating where the data came from, and find a significant amount of text from unexpected sources like patents and US military websites. Then we explore the content of the text itself, and find machine-generated text (e.g., from machine translation systems) and evaluation examples from other benchmark NLP datasets. To understand the impact of the filters applied to create this dataset, we evaluate the text that was removed, and show that blocklist filtering disproportionately removes text from and about minority individuals. Finally, we conclude with some recommendations for how to created and document web-scale datasets from a scrape of the internet.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2104.08758v2">arXiv:2104.08758v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/s3gkabvc7bhf7f6kt4r6ff6t6q">fatcat:s3gkabvc7bhf7f6kt4r6ff6t6q</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20211003001301/https://arxiv.org/pdf/2104.08758v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/1a/da/1adadbfa95e43a70fcd17e6ce947a0652b86bfc3.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2104.08758v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [article]

Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith
<span title="2020-05-05">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Language models pretrained on text from a wide variety of sources form the foundation of today's NLP. In light of the success of these broad-coverage models, we investigate whether it is still helpful to tailor a pretrained model to the domain of a target task. We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks, showing that a second phase of pretraining in-domain (domain-adaptive pretraining) leads to
more &raquo; ... ce gains, under both high- and low-resource settings. Moreover, adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining. Finally, we show that adapting to a task corpus augmented using simple data selection strategies is an effective alternative, especially when resources for domain-adaptive pretraining might be unavailable. Overall, we consistently find that multi-phase adaptive pretraining offers large gains in task performance.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2004.10964v3">arXiv:2004.10964v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/cwmjixjpcve25hezsj6kgjxdji">fatcat:cwmjixjpcve25hezsj6kgjxdji</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200528173625/https://arxiv.org/pdf/2004.10964v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2004.10964v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

SRL4ORL: Improving Opinion Role Labeling using Multi-task Learning with Semantic Role Labeling [article]

Ana Marasović, Anette Frank
<span title="2018-04-19">2018</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
For over a decade, machine learning has been used to extract opinion-holder-target structures from text to answer the question "Who expressed what kind of sentiment towards what?". Recent neural approaches do not outperform the state-of-the-art feature-based models for Opinion Role Labeling (ORL). We suspect this is due to the scarcity of labeled training data and address this issue using different multi-task learning (MTL) techniques with a related task which has substantially more data, i.e.
more &raquo; ... emantic Role Labeling (SRL). We show that two MTL models improve significantly over the single-task model for labeling of both holders and targets, on the development and the test sets. We found that the vanilla MTL model which makes predictions using only shared ORL and SRL features, performs the best. With deeper analysis we determine what works and what might be done to make further improvements for ORL.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1711.00768v3">arXiv:1711.00768v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mi7vca5xnzerpl7xdhvyuxa5ra">fatcat:mi7vca5xnzerpl7xdhvyuxa5ra</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200906031805/https://arxiv.org/pdf/1711.00768v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/88/9f/889fe778faa0b0e28b883b42078e569adc0a9d38.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1711.00768v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

SRL4ORL: Improving Opinion Role Labeling Using Multi-Task Learning with Semantic Role Labeling

Ana Marasović, Anette Frank
<span title="">2018</span> <i title="Association for Computational Linguistics"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/d5ex6ucxtrfz3clshlkh3f6w2q" style="color: black;">Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)</a> </i> &nbsp;
For over a decade, machine learning has been used to extract opinion-holder-target structures from text to answer the question Who expressed what kind of sentiment towards what?. Recent neural approaches do not outperform the state-of-the-art feature-based models for Opinion Role Labeling (ORL). We suspect this is due to the scarcity of labeled training data and address this issue using different multi-task learning (MTL) techniques with a related task which has substantially more data, i.e.
more &raquo; ... antic Role Labeling (SRL). We show that two MTL models improve significantly over the single-task model for labeling of both holders and targets, on the development and the test sets. We found that the vanilla MTL model which makes predictions using only shared ORL and SRL features, performs the best. With deeper analysis we determine what works and what might be done to make further improvements for ORL.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/n18-1054">doi:10.18653/v1/n18-1054</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/naacl/MarasovicF18.html">dblp:conf/naacl/MarasovicF18</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/leo5fmomd5dybknly2lk4tfjcq">fatcat:leo5fmomd5dybknly2lk4tfjcq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200305192141/https://www.aclweb.org/anthology/N18-1054.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/25/57/2557adc02e1c07d0af0f6a93eb5547a1cce08eca.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.18653/v1/n18-1054"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> Publisher / doi.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 215 results