12,169 Hits in 3.4 sec

Authorship Attribution Using Text Distortion

Efstathios Stamatatos
2017 Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers  
In this paper, we present a novel method that enhances authorship attribution effectiveness by introducing a text distortion step before extracting stylometric measures.  ...  Based on experiments on two main tasks in authorship attribution, closed-set attribution and authorship verification, we demonstrate that the proposed approach can enhance existing methods especially under  ...  We are going to examine the following three cases: • Baseline: original input texts are used (no text distortion). • DV-MA: the input texts are distorted using the Algorithm 1. • DV-SA: the input texts  ... 
doi:10.18653/v1/e17-1107 dblp:conf/eacl/Stamatatos17 fatcat:wwbnsijdtjch5ickkct55zbequ

EACH-USP Ensemble Cross-domain Authorship Attribution: Notebook for PAN at CLEF 2018

José Eleandro Custódio, Ivandré Paraboni
2018 Conference and Labs of the Evaluation Forum  
We present an ensemble approach to cross-domain authorship attribution that combines predictions made by three independent classifiers, namely, standard char n-grams, char n-grams with non-diacritic distortion  ...  Results generally outperform the PAN-CLEF 2018 baseline system that makes use of fixed-length char n-grams and linear SVM classification.  ...  At PAN-CLEF 2018, a cross-domain authorship attribution task applied to fan fiction text has been proposed.  ... 
dblp:conf/clef/CustodioP18 fatcat:vmxzltyxgjfvjndoeygz4f72aa

Overview of the Cross-domain Authorship Attribution Task at PAN 2019

Mike Kestemont, Efstathios Stamatatos, Enrique Manjavacas, Walter Daelemans, Martin Potthast, Benno Stein
2019 Conference and Labs of the Evaluation Forum  
In this edition of PAN, we focus on authorship attribution, where the task is to attribute an unknown text to a previously seen candidate author.  ...  Cross-Domain Authorship Attribution Authorship attribution [1,2,3] continues to be an important problem in information retrieval and computational linguistics, and also in applied areas such as law and  ...  A language-independent authorship attribution approach, framing attribution as a conventional text classification problem [15] .  ... 
dblp:conf/clef/KestemontSMDPS19 fatcat:bjgpyegqyfc35flrqdixlpmo3u

Authorship Attribution in Fan-fictional Texts given Variable Length Character and Word n-grams

Lukas Muttenthaler, Gordon Lucas, Janek Amann
2019 Conference and Labs of the Evaluation Forum  
The task of authorship attribution (AA) requires text features to be represented according to rigorous experiments.  ...  the lexical features and content of a given text.  ...  Introduction Authorship Attribution (AA) is the task of determining the author of a text from a set of candidates.  ... 
dblp:conf/clef/MuttenthalerLA19 fatcat:kerwcy26hzajhagtf6rzrvgvuq

Cross-Domain Authorship Attribution Combining Instance Based and Profile-Based Features

Andrea Bacciu, Massimo La Morgia, Alessandro Mei, Eugenio Nerio Nemmi, Valerio Neri, Julinda Stefa
2019 Conference and Labs of the Evaluation Forum  
We use ngrams of characters, words, stemmed words, and distorted text. Our model has an SVM for each feature and an ensemble architecture.  ...  In this notebook, we propose our model for the Authorship Attribution task of PAN 2019, that focuses on cross-domain setting covering 4 different languages: French, Italian, English, and Spanish.  ...  We use different pre-processing techniques to extract features of a different meaning. We use text distortion, tokenization, stemming, and POS tagging to prepare the text for the extraction.  ... 
dblp:conf/clef/BacciuMMNNS19a fatcat:2ceoj43zwbaqnkvluy5zpyqviu

Multi-Task Learning for Authorship Attribution via Topic Approximation and Competitive Attention

Wei Song, Chen Zhao, Lizhen Liu
2019 IEEE Access  
Separating content from style is a fundamental problem in authorship attribution to represent topic independent personal style of authors.  ...  In addition to authorship attribution as the main task, we introduce a novel auxiliary task topic approximation to guide the learning of topic representations with the topic distributions inferred by topic  ...  Reference [32] proposed a text distortion solution.  ... 
doi:10.1109/access.2019.2957152 fatcat:yfaqvwo3jzfido4tux6ccufwtq

Authorship Arabic Text Detection According to Style of Writing by Using (SABA) Method

Ehsan Ali Al-Zubaidi
2017 Asian Journal of Applied Sciences  
Authorship attribution of a style of writing is a method depend on analyzing texts in text mining, e.g., historical books and novels that famous authors wrote, attempted to measure the author's style,  ...  Assuming that these writers have a different way of writing that no other writer have; thus, authorship attribution is the essential of identifying the author of a given text [1].  ...  To be more specific, in the authorship investigation using the style of the author, we use a sub-field of text mining called "Authorship attribution" and "Stylometric Text mining".  ... 
doi:10.24203/ajas.v5i2.4750 fatcat:aat5w4ao5vdhrg5r6bg2ryneny

A Controlled-corpus Experiment in Authorship Identification by Cross-entropy

P. Juola
2005 Literary and Linguistic Computing  
In particular, despite the designed difficulty of the Dutch corpus used, the technique was still able to reliably detect not only authorship, but also subtle features of register, topic, and even the educational  ...  This paper describes an authorship, and more generally document classification, experiment on a preexisting Dutch corpus of university writings.  ...  Forsyth (1997, compiled a first benchmark collection of texts for validating authorship attribution techniques.  ... 
doi:10.1093/llc/fqi024 fatcat:cs25uj6gszgm7dru6r3jw4ekpq

Protecting Anonymous Speech: A Generative Adversarial Network Methodology for Removing Stylistic Indicators in Text [article]

Rishi Balakrishnan, Stephen Sloan, Anil Aswani
2021 arXiv   pre-print
Existing approaches to authorship anonymization, also known as authorship obfuscation, often focus on protecting binary demographic attributes rather than identity as a whole.  ...  , when given a sample of previous work, can match text with its author out of hundreds of possible candidates.  ...  Introduction Stylometric authorship attribution attempts to identify the author or attributes of the author of a given piece of text.  ... 
arXiv:2110.09495v1 fatcat:cc6pfo2otjb37ooeqobvt6fhay

Agree-to-Disagree (A2D): A Deep Learning based Framework for Authorship Discrimination Task in Corpus-specificity Free Manner

Md. Tawkat Islam Khondaker, Junaed Younus Khan, Tanvir Alam, M. Sohel Rahman
2020 IEEE Access  
At the first stage, it learns the authorship attributes with its Agree network.  ...  Subsequently, through its Disagree network, the framework attempts to differentiate the authorship of a new dataset (completely unrelated to the training dataset), a novel use case that has not been systematically  ...  Authorship attribution Authorship Attribution Using Text Distortion [17] 2017 Text distortion algorithm to extract stylometric features Authorship attribution Authorship Attribution for Social Media Forensics  ... 
doi:10.1109/access.2020.3021658 fatcat:364a2wb3rvawph34jqkhpkqcpu

Blogs, Twitter Feeds, and Reddit Comments: Cross-domain Authorship Attribution

Rebekah Overdorf, Rachel Greenstadt
2016 Proceedings on Privacy Enhancing Technologies  
Stylometry is a form of authorship attribution that relies on the linguistic information to attribute documents of unknown authorship based on the writing styles of a suspect set of authors.  ...  authorship attribution.  ...  Related Work Stylometry Machine learning techniques have been used, to great success, in authorship attribution of documents.  ... 
doi:10.1515/popets-2016-0021 dblp:journals/popets/OverdorfG16 fatcat:qkjpbqh5zvcvdd7yl2dv4kkkla

Cross-Domain Authorship Attribution Using Pre-trained Language Models [chapter]

Georgios Barlas, Efstathios Stamatatos
2020 IFIP Advances in Information and Communication Technology  
An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre.  ...  Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics.  ...  The main idea is that a character-level RNN is produced using all available texts by the candidate authors while a separate output is built for each author (MHC).  ... 
doi:10.1007/978-3-030-49161-1_22 fatcat:j7kjmdxtezb4rbc3agisl23ym4

A robust authorship attribution on big period

Mubin Shoukat Tamboli, Rajesh Prasad
2019 International Journal of Electrical and Computer Engineering (IJECE)  
Authorship attribution is a task to identify the writer of unknown text and categorize it to known writer. Writing style of each author is distinct and can be used for the discrimination.  ...  And character n-gram, word n-gram and pos n-gram features used to build the model.  ...  Actually, this is a tedious task but it can be simplified using authorship attribution. Generally, messages on web are nameless.  ... 
doi:10.11591/ijece.v9i4.pp3167-3174 fatcat:truu5jfvejbivdfafdzucnjftq

Authorship Attribution through Punctuation n-grams and Averaged Combination of SVM

Carolina Martín del Campo Rodríguez, Daniel Alejandro Pérez Alvarez, Christian Efraín Maldonado Sifuentes, Grigori Sidorov, Ildar Z. Batyrshin, Alexander F. Gelbukh
2019 Conference and Labs of the Evaluation Forum  
The use of punctuation n-grams as a feature representation of a document is introduced for the Authorship Attribution in combination with traditional character n-grams.  ...  This approach managed to obtain 0.642 with the Macro F1-score for the PAN 2019 contest of open-set Cross-Domain Authorship Attribution.  ...  [6] shows that the elimination of topic-dependent information from texts allows to improve the performance of authorship attribution classifiers.  ... 
dblp:conf/clef/RodriguezASSBG19 fatcat:krckg4stsza6pd65l6lehzccmq

Towards Authorship Attribution in Arabic Short-Microblog Text

Kamal Mansour Jambi, Imtiaz Hussain Khan, Muazzam Ahmed Siddiqui, Salma Omar Alhaj
2021 IEEE Access  
One possible extension is to use pre-trained language models like AraBert, which has recently gained much popularity on the related NLP tasks.  ...  In [13] , the authors proposed method to enhance authorship attribution effectiveness by introducing a text distortion step before extracting style-metric measures.  ...  Character n-grams were used with a Convolutional Neural Network (CNN) by [31] for authorship attribution of short texts.  ... 
doi:10.1109/access.2021.3112624 fatcat:dgbwsl6eybf7tfj4gybhgome2i
« Previous Showing results 1 — 15 out of 12,169 results