Filters








272 Hits in 4.4 sec

Extracting Lexically Divergent Paraphrases from Twitter

Wei Xu, Alan Ritter, Chris Callison-Burch, William B. Dolan, Yangfeng Ji
2014 Transactions of the Association for Computational Linguistics  
Our model also captures lexically divergent paraphrases that differ from yet complement previous methods; combining our model with previous work significantly outperforms the stateof-the-art.  ...  In addition, we present a novel annotation methodology that has allowed us to crowdsource a paraphrase corpus from Twitter. We make this new dataset available to the research community.  ...  MULTIP discovers lexically divergent paraphrases while LEXLATENT prefers more overall sentence similarity.  ... 
doi:10.1162/tacl_a_00194 fatcat:54bwppjf55asjmpi7cicq46vt4

A Continuously Growing Dataset of Sentential Paraphrases [article]

Wuwei Lan, Siyu Qiu, Hua He, Wei Xu
2017 arXiv   pre-print
In this paper, we present a new method to collect large-scale sentential paraphrases from Twitter by linking tweets through shared URLs.  ...  phrasal paraphrase extraction.  ...  As numerous Twitter users spontaneously talk about varied topics, this dataset contains many lexically divergent paraphrases.  ... 
arXiv:1708.00391v1 fatcat:ew3wzyt5rvbobd23fiedbemloa

A Continuously Growing Dataset of Sentential Paraphrases

Wuwei Lan, Siyu Qiu, Hua He, Wei Xu
2017 Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing  
In this paper, we present a new method to collect large-scale sentential paraphrases from Twitter by linking tweets through shared URLs.  ...  phrasal paraphrase extraction.  ...  As numerous Twitter users spontaneously talk about varied topics, this dataset contains many lexically divergent paraphrases.  ... 
doi:10.18653/v1/d17-1126 dblp:conf/emnlp/LanQHX17 fatcat:yqgxihsiarg4hahwx3ijbatfpu

Idiom Paraphrases: Seventh Heaven vs Cloud Nine

Maria Pershina, Yifan He, Ralph Grishman
2015 Proceedings of the First Workshop on Linking Computational Models of Lexical, Sentential and Discourse-level Semantics  
Of particular interest in this area is the identification of paraphrases among short texts, such as SMS and Twitter.  ...  The goal of paraphrase identification is to decide whether two given text fragments have the same meaning.  ...  The state-of-the-art model for lexically divergent paraphrases on Twitter is tailored for tweets and requires topic and anchor words to be present in the sentence, which is not applicable to idiom definitions  ... 
doi:10.18653/v1/w15-2709 dblp:conf/emnlp/PershinaHG15 fatcat:hsh677tnszenjo472wityqeqhm

Paraphrasing vs Coreferring: Two Sides of the Same Coin [article]

Yehudit Meged, Avi Caciularu, Vered Shwartz, Ido Dagan
2020 arXiv   pre-print
First, we used annotations from an event coreference dataset as distant supervision to re-score heuristically-extracted predicate paraphrases.  ...  We study the potential synergy between two different NLP tasks, both confronting predicate lexical variability: identifying predicate paraphrases, and event coreference resolution.  ...  Acknowledgements This work was supported in part by grants from Intel Labs, Facebook, the Israel Science Foundation grant 1951/17, the Israeli Ministry of Science and Technology and the German Research  ... 
arXiv:2004.14979v2 fatcat:rbl4jmlibvdbnjvbetl4st4jk4

Assessing the Robustness of Conversational Agents using Paraphrases

Jonathan Guichard, Elayne Ruane, Ross Smith, Dan Bean, Anthony Ventresque
2019 2019 IEEE International Conference On Artificial Intelligence Testing (AITest)  
Paraphrases, which are different ways of expressing the same intent, are generated based on known working input by performing lexical substitutions.  ...  In this paper we explore the use of paraphrases as a testing tool for conversational agents.  ...  select sentences that are paraphrases of each other with the sentences being extracted from tweets of trending topics [23] .  ... 
doi:10.1109/aitest.2019.000-7 dblp:conf/aitest/GuichardRSBV19 fatcat:chskvizbsnhmtfm2t5adwtwxda

SLEDDED: A Proposed Dataset of Event Descriptions for Evaluating Phrase Representations

Laura Rimell, Eva Maria Vecchi
2016 Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP  
We propose SLEDDED (Syntactically and LExically Divergent Dataset of Event Descriptions), a dataset of event descriptions in which related phrase pairs are designed to exhibit minimal lexical and syntactic  ...  We describe a method for extracting candidate pairs from a corpus based on occurrences of event nouns (e.g. war) and a two-step annotation process consisting of expert annotation followed by crowdsourcing  ...  We considered several existing methods for automatic extraction of paraphrases that are lexically or syntactically divergent; however, none are exactly suited for our proposed dataset.  ... 
doi:10.18653/v1/w16-2525 dblp:conf/repeval/RimellV16 fatcat:kg35frkii5c3nfrqxdm3zbvze4

Inferring Perceived Demographics from User Emotional Tone and User-Environment Emotional Contrast

Svitlana Volkova, Yoram Bachrach
2016 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
Our analysis is based on a large Twitter dataset, consisting of the tweets of 123,513 users from the USA and Canada.  ...  rely solely on user communications, we explore the network structure and show that it is possible to accurately predict a range of perceived demographic traits based solely on the emotions emanating from  ...  contrast), and All (EmoSent + Lexical) features extracted from user content. and surprise emotions can be predictive of users with higher education.  ... 
doi:10.18653/v1/p16-1148 dblp:conf/acl/VolkovaB16 fatcat:sw25zkmv7jga3mtb4tmvt7rlhe

Modal Sense Classification At Large

Ana Marasović, Mengfei Zhou, Alexis Palmer, Anette Frank
2016 Linguistic Issues in Language Technology  
Previous work on modal sense classification achieved relatively high performance using shallow lexical and syntactic features drawn from small-size annotated corpora.  ...  In this work we create large-scale, high-quality annotated corpora for modal sense classification using an automatic paraphrase-driven projection approach.  ...  We thank Annemarie Friedrich for advice on feature extraction, preparations for the MASC data, as well as her contributions to Zhou et al. (2015) .  ... 
doi:10.33011/lilt.v14i.1397 fatcat:mnmn3r5fubf7vi5xmrq27riyxe

Challenges in Emotion Style Transfer: An Exploration with a Lexical Substitution Pipeline [article]

David Helbig and Enrica Troiano and Roman Klinger
2020 arXiv   pre-print
to change the emotion (with a brute-force approach and selection based on the attention mechanism of an emotion classifier), (2) find sets of words as candidates for substituting the words (based on lexical  ...  This comparably straight-forward setup enables us to explore the task and understand in what cases lexical substitution can vary the emotional load of texts, how changes in content and style interact and  ...  Objective The set of candidate paraphrases produced at substitution time, based on the selections, are an overgeneration which might not be fluent, diverge from the original meaning, and might not contain  ... 
arXiv:2005.07617v1 fatcat:642ppn2tdrhbhkzou6b2pw4vfm

Debugging Frame Semantic Role Labeling [article]

Alexandre Kabbach
2019 arXiv   pre-print
We propose a quantitative and qualitative analysis of the performances of statistical models for frame semantic structure extraction.  ...  using a rule-based algorithm combining valence pattern matching and lexical substitution.  ...  They expanded this model with a similarity graph computed with WordNet to handle frame identification for unseen lexical units.  ... 
arXiv:1901.07475v1 fatcat:7sktiu6yjnawnomsqvh5sgde7u

Generating Black-Box Adversarial Examples for Text Classifiers Using a Deep Reinforced Model [article]

Prashanth Vijayaraghavan, Deb Roy
2019 arXiv   pre-print
In natural language domain, small perturbations in the form of misspellings or paraphrases can drastically change the semantics of the text.  ...  These paraphrase datasets together contains text from various sources: Common Crawl, CzEng1.6, Europarl, News Commentary, Quora questions, and Twitter trending topic tweets.  ...  Dataset Collection In this paper, we use paraphrase datasets like PARANMT-50M corpus [37] , Quora Question Pair dataset 1 and Twitter URL paraphrasing corpus [23] .  ... 
arXiv:1909.07873v1 fatcat:lug35v7xd5d27k5tjy6js6cq5a

Extraction of Code-mixed Aspect Topics in Semantic Representation

Kavita Sanjay Asnani, Jyoti D Pawar
2018 Journal of Computacion y Sistemas  
As a standard, topic modeling has a potential of extracting aspects pertaining to opinion data from large text.  ...  the state-of-the-art techniques used for aspect extraction of code-mixed data.  ...  Extracting product features and opinions from reviews.  ... 
doi:10.13053/cys-22-1-2771 fatcat:myj6nppbrzfhdj36qyokyo4o64

TweetNorm: a benchmark for lexical normalization of Spanish tweets

Iñaki Alegria, Nora Aranberri, Pere R. Comas, Víctor Fresno, Pablo Gamallo, Lluis Padró, Iñaki San Vicente, Jordi Turmo, Arkaitz Zubiaga
2015 Language Resources and Evaluation  
In this paper we present a benchmark for lexical normalization of social media posts, specifically for tweets in Spanish language.  ...  The organization of this challenge has led to the production of a benchmark for lexical normalization of social media, including an evaluation framework, as well as an annotated corpus of Spanish tweets  ...  from other corpora to identify common misspellings on the Internet and Twitter.  ... 
doi:10.1007/s10579-015-9315-6 fatcat:bhltncndhfazpkobax6w57gzgu

Evaluating the motivation of Red Cross Health volunteers in the COVID-19 pandemic: a mixed-methods study protocol

Leonardo W Heyerdahl, Muriel Vray, Vincent Leger, Lénaig Le Fouler, Julien Antouly, Virginie Troit, Tamara Giles-Vernick
2021 BMJ Open  
These findings will iteratively shape and be influenced by a social media (Twitter) analysis of biomedical and public health uncertainties and debates around COVID-19.  ...  Data collection began on 15 June 2020 and will continue until 15 April 2021.Ethics and disseminationThe protocol has received ethical approval from the Institut Pasteur Institutional Review Board (no 2020  ...  Our analysis of Twitter data, using anthropological coding procedures, will aggregate and paraphrase these data to protect fully the identities of those producing tweets.  ... 
doi:10.1136/bmjopen-2020-042579 pmid:33500285 fatcat:iffqaqxfnnexvne4hsqkup4yfe
« Previous Showing results 1 — 15 out of 272 results