Filters








1,171 Hits in 5.4 sec

DTSim at SemEval-2016 Task 1: Semantic Similarity Model Including Multi-Level Alignment and Vector-Based Compositional Semantics

Rajendra Banjade, Nabin Maharjan, Dipesh Gautam, Vasile Rus
2016 Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)  
We developed Support Vector Regression model with various features including the similarity scores calculated using alignment based methods and semantic composition based methods.  ...  Interpretable Feature Based Method We aligned chunks from one sentence to another and assigned semantic relations and similarity scores for each alignment.  ...  Number of EQUI, OPPO, REL, SIMI, and SPE relations in aligning chunks between texts relative to the total number of alignments. 12.  ... 
doi:10.18653/v1/s16-1097 dblp:conf/semeval/BanjadeMGR16 fatcat:4jhb2yrkbfcxhea5dkvwwrm5du

Overview of the PAN/CLEF 2015 Evaluation Lab [chapter]

Efstathios Stamatatos, Martin Potthast, Francisco Rangel, Paolo Rosso, Benno Stein
2015 Lecture Notes in Computer Science  
During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally.  ...  In plagiarism detection, community-driven corpus construction is introduced as a new way of developing evaluation resources with diversity.  ...  Related Work Research on plagiarism detection has a long history, both within PAN and without.  ... 
doi:10.1007/978-3-319-24027-5_49 fatcat:fcpf2p7nujet5ez4zswoiscatq

Plagiarism Detection for Indonesian Texts

Lucia D. Krisnawati, Klaus U. Schulz
2013 Proceedings of International Conference on Information Integration and Web-based Applications & Services - IIWAS '13  
Expectantly, this alignment method will increase the recognition rate on summarized passages too. 4 .  ...  We plan to incorporate sentence alignment which collects contextual evidence and exploits word similarity introduced in [172] to increase system's recognition on heavily paraphrased passages.  ...  Glinos applies 3-step clustering based on topic related words.  ... 
doi:10.1145/2539150.2539213 dblp:conf/iiwas/KrisnawatiS13 fatcat:r6p2h4oiq5fi3mhlazokatknrq

Academic Plagiarism Detection

Tomáš Foltýnek, Norman Meuschke, Bela Gipp
2019 ACM Computing Surveys  
Since we seek to cover the most influential papers on academic plagiarism detection, we consider a relevance ranking based on citation counts as an advantage rather than a disadvantage.  ...  Plagiarized research papers can skew meta-studies and thus jeopardize patient safety [65] . Furthermore, academic plagiarism wastes resources.  ...  Most research papers on text-based plagiarism detection methods we review in this article do not describe any format conversion or text extraction procedures.  ... 
doi:10.1145/3345317 fatcat:yk6f5xl2kvdxlhvsolem6zfdsu

SimCompass: Using Deep Learning Word Embeddings to Assess Cross-level Similarity

Carmen Banea, Di Chen, Rada Mihalcea, Claire Cardie, Janyce Wiebe
2014 Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)  
Using a meta-learning framework, we experiment with traditional knowledgebased metrics, as well as novel corpusbased measures based on deep learning paradigms, paired with varying degrees of context expansion  ...  Acknowledgments This material is based in part upon work supported by National Science Foundation CAREER award #1361274 and IIS award #1018613 and by DARPA-BAA-12-47 DEFT grant #12475008.  ...  These scores then become features for a meta-learner, which is able to optimize their impact on the prediction process.  ... 
doi:10.3115/v1/s14-2098 dblp:conf/semeval/BaneaCMCW14 fatcat:by5gxcb73vd23l6oqxra4rlmp4

Multi-Retranslation Corpora: Visibility, Variation, Value, and Virtue

2016 Literary and Linguistic Computing  
onto a base text.  ...  We present a web-based system which enables users to create parallel, segment-aligned multi-version corpora, and provides visual interfaces for exploring multiple translations, with their variation projected  ...  Even that is only a start, as Flanagan points out: Ebla can be used to calculate different kinds of variation statistics for base text segments based on aligned corpus content.  ... 
doi:10.1093/llc/fqw027 fatcat:rtb4xroedza23latis7fgpitca

A Novel Approach for Developing Paraphrase Detection System using Machine Learning

Rudradityo Saha, G. Bharadwaja Kumar
2021 International Journal of Computer Applications  
To identify cases of plagiarism and hence discourage the same, this paper presents a novel Supervised Machine Learning based Paraphrase Detection System developed by conducting experiments using Microsoft  ...  Research Paraphrase (MSRP) Corpus and assessed on the same.  ...  Features based on monotonic and non-monotonic alignments, and semantic features, namely Boolean features are utilized in the suggested method.  ... 
doi:10.5120/ijca2021921389 fatcat:clya63hxdvfz3kpmszhrjmhc2i

Two Case Studies of Experience Prototyping Machine Learning Systems in the Wild [article]

Qian Yang
2019 arXiv   pre-print
For example, physicians asked: Are ML predictions made based on clinicians' best efforts? Is it ethical to make decisions based on previous patients' collective outcomes?  ...  even considered adopting machine suggestions as plagiarism, therefore "is simply wrong".  ...  ACKNOWLEDGEMENT The contents of this paper were developed under grants from the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR grant numbers 90RE5011 and 90REGE0007  ... 
arXiv:1910.09137v1 fatcat:qvum4atkl5gmjk3ajrx3bqggpe

Improving the Reproducibility of PAN's Shared Tasks: [chapter]

Martin Potthast, Tim Gollub, Francisco Rangel, Paolo Rosso, Efstathios Stamatatos, Benno Stein
2014 Lecture Notes in Computer Science  
This paper reports on the PAN 2014 evaluation lab which hosts three shared tasks on plagiarism detection, author identification, and author profiling.  ...  We evaluate different aspects of plagiarism and text reuse detectors within the two tasks source retrieval and text alignment.  ...  Table 2 shows the overall performance of eleven plagiarism detectors that implemented text alignment.  ... 
doi:10.1007/978-3-319-11382-1_22 fatcat:anztewljlbgznjdipotoxcdp2q

Revise and Resubmit: An Intertextual Model of Text-based Collaboration in Peer Review [article]

Ilia Kuznetsov, Jan Buchmann, Max Eichler, Iryna Gurevych
2022 arXiv   pre-print
cycle: pragmatic tagging, linking and long-document version alignment.  ...  While existing NLP studies focus on the analysis of individual texts, editorial assistance often requires modeling interactions between pairs of texts -- yet general frameworks and datasets to support  ...  While work on review score and acceptance prediction based on the whole review or paper text is abundant, the applications of NLP to assist the process of reviewing itself are few: the argumentation mining  ... 
arXiv:2204.10805v1 fatcat:2xxypbobzzakrcb2oxvdmobxsu

Users and Data: The Two Neglected Children of Bilingual Natural Language Processing Research

Phillippe Langlais
2017 Proceedings of the 10th Workshop on Building and Using Comparable Corpora  
I argue that this state of affairs is mainly due to two factors: the emphasis published authors put on models (even though data is as important), and the conspicuous lack of concern for actual end-users  ...  The impact of sentence alignment errors on phrase-based machine translation performance. In 10th AMTA. Grégoire and Philippe Langlais. 2017.  ...  Target-text mediated interactive machine translation. Machine Translation 12(1):175-194. George Foster, Philippe Langlais, and Guy Lapalme. 2002. User-friendly text prediction for translators.  ... 
doi:10.18653/v1/w17-2501 dblp:conf/acl-bucc/Langlais17 fatcat:r6z7ztddfrh4jjyhkr7u7eb7ei

Biomedical text mining for research rigor and integrity: tasks, challenges, directions

Halil Kilicoglu
2017 Briefings in Bioinformatics  
In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information  ...  Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research.  ...  Rindflesch, Olivier Bodenreider and Caroline Zeiss for their comments on earlier drafts of this article.  ... 
doi:10.1093/bib/bbx057 pmid:28633401 fatcat:va4d3u6zzjbpnfptseb23tnv7y

UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering [article]

Marc Franco-Salvador, Sudipta Kar, Thamar Solorio, Paolo Rosso
2018 arXiv   pre-print
Our system represents instances by using both lexical and semantic-based similarity measures between text pairs.  ...  For each subtask we optimized a threshold to determine the relevance of each instance that is based on our predicted ranking relevance. 5 We used the Spearmint toolkit: https://github.com/HIPS/Spearmint  ...  Other works such as Buscaldi et al. (2010) are based on the redundancy of n-grams in order to find one or more text fragments that include tokens of the original question and the answer.  ... 
arXiv:1807.11584v1 fatcat:gujwabqoqba73ocyljexjsdpyy

UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering

Marc Franco-Salvador, Sudipta Kar, Thamar Solorio, Paolo Rosso
2016 Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)  
Our system represents instances by using both lexical and semantic-based similarity measures between text pairs.  ...  For each subtask we optimized a threshold to determine the relevance of each instance that is based on our predicted relevance ranking. In other words, we binarize our ranking.  ...  Other works such as Buscaldi et al. (2010) are based on the redundancy of n-grams in order to find one or more text fragments that include tokens of the original question and the answer.  ... 
doi:10.18653/v1/s16-1126 dblp:conf/semeval/Franco-Salvador16 fatcat:buyjlvpzejh6lkhwhk7y6nejlm

Biomedical Text Mining for Research Rigor and Integrity: Tasks, Challenges, Directions [article]

Halil Kilicoglu
2017 bioRxiv   pre-print
In particular, we identify four key areas in which text mining techniques can make a significant contribution: plagiarism/fraud detection, ensuring adherence to reporting guidelines, managing information  ...  Recent years have seen a flurry of activities focusing on standardization and guideline development to enhance the reproducibility and rigor of biomedical research.  ...  Rindflesch, Olivier Bodenreider, and Caroline Zeiss for their comments on earlier drafts of this paper. Funding This work was supported by the intramural research program at the U.S.  ... 
doi:10.1101/108480 fatcat:7thsz7zjozfqbmqmkokqrn4bki
« Previous Showing results 1 — 15 out of 1,171 results