Filters








118 Hits in 3.9 sec

Content Facets For Individual Information Needs In Media

Elisabeth Lex, Stefanie Lindstaedt, Michael Granitzer, Harald Kosch
2018 Zenodo  
Second, a feature study revealed that stylometric features are better suited to assess topic independent content facets, while for topic oriented content facets Bag-of-Words features serve best.  ...  To address the problem of lacking training data in classification, this thesis investi- gated whether available classification schemes from traditional media can be mapped onto blogs.  ...  . . 132 5.19 Emotion Classification in Blogs: Lexical Features on Blog Post Level 133 5.20 Emotion Classification in Blogs: Stylometric Features on Blog Post Level . . . . . . . . . . . . . . . .  ... 
doi:10.5281/zenodo.1196397 fatcat:udr3736ejbek5lzl34tu4g4ppq

Multi-domain Alias Matching Using Machine Learning

Michael Ashcroft, Fredrik Johansson, Lisa Kaati, Amendra Shrestha
2016 2016 Third European Network Intelligence Conference (ENIC)  
The use of emotion-related and Twitter-related features yield no significant impact on the results.  ...  Experiments show that combining stylometric and timebased features yield good results on our synthetic datasets and a small-scale evaluation on real-world blog data confirm these results, yielding a precision  ...  In such circumstances we use notation such as S+T+E, which should be interpreted as the feature vector obtained when combining stylometric features with time-based features and the new emotion-related  ... 
doi:10.1109/enic.2016.019 dblp:conf/enic/AshcroftJKS16 fatcat:irpr3u544revpkqbbk5imbgw5e

Content Facets For Individual Information Needs In Media

Elisabeth Lex, Stefanie Lindstaedt, Michael Granitzer, Harald Kosch
2018 Zenodo  
Second, a feature study revealed that stylometric features are better suited to assess topic independent content facets, while for topic oriented content facets Bag-of-Words features serve best.  ...  To address the problem of lacking training data in classification, this thesis investi- gated whether available classification schemes from traditional media can be mapped onto blogs.  ...  Stylometric Features for Emotion Level Classification in News Related Blogs.  ... 
doi:10.5281/zenodo.1195993 fatcat:ce3ljnthjfhkpir3y4atnnlicy

Online Social Networks and Writing Styles — A Review of the Multidisciplinary Literature

Kah Yee Tai, Jasbir Dhaliwal, Shafiza Mohd Shariff
2020 IEEE Access  
Thus, in this paper, we also propose a novel machine learning prediction model based on tense morphology, to classify age and gender from English blogs, and the PAN 2013 dataset.  ...  This model achieves an accuracy of 94%-98% and 95%-97% for age and gender, respectively. INDEX TERMS Online social networks, survey, writing styles. 67028 VOLUME 8, 2020  ...  Other than kNN and SMO, Artificial Neural Networks (ANN) has also been applied in classification tasks using stylometric features to determine authorship of documents.  ... 
doi:10.1109/access.2020.2985916 fatcat:qpejezxkmveyhmpazdz5a3mf7q

Sentiment Mining from Text: A Technical Review

Mita K.
2017 International Journal of Computer Applications  
mining tasks such as, 'detecting the presence of emotion in text', 'selecting a model for representing emotion', 'classifying the sentiment polarity of text' and 'measuring the intensity of the expressed  ...  important areas for future scope of research in this field.  ...  Recent approaches for sentiment polarity classification are based on semantic feature set extraction [4] , [28] , [29] .  ... 
doi:10.5120/ijca2017915126 fatcat:zqqpafrvhzhxtb25s7qlbrx7du

'twazn me!!! ;(' Automatic Authorship Analysis of Micro-Blogging Messages [chapter]

Rui Sousa Silva, Gustavo Laboreiro, Luís Sarmento, Tim Grant, Eugénio Oliveira, Belinda Maia
2011 Lecture Notes in Computer Science  
In this paper we propose a set of stylistic markers for automatically attributing authorship to micro-blogging messages.  ...  For that purpose, we train SVM classifiers to learn stylometric models for each author based on different combinations of the groups of stylistic features that we propose.  ...  We chose to use Support Vector Machines (SVM) [14] as the classification algorithm for its proven effectiveness in text classification tasks and robustness in handling a large number of features.  ... 
doi:10.1007/978-3-642-22327-3_16 fatcat:znnijiqtjffrjembgph5b4cdaq

Taking the Pulse of US College Campuses with Location-Based Anonymous Mobile Apps

Yanqiu Wu, Tehila Minkus, Keith W. Ross
2017 ACM Transactions on Intelligent Systems and Technology  
We deploy GPS hacking in conjunction with location-based mobile apps to passively survey users in targeted geographical regions.  ...  We collect nearly 1.6 million Yik Yak messages ("yaks") from a diverse set of 45 college campuses in the United States.  ...  Preprocessing for Prediction We use two kinds of features, bag-of-words features and stylometric features, for gender prediction. Details about the stylometric features can be found in Table 3 .  ... 
doi:10.1145/3078843 fatcat:izcmqzho5fbxnpbxhqixyy2kbm

On the use of distributed semantics of tweet metadata for user age prediction

Abhinay Pandya, Mourad Oussalah, Paola Monachesi, Panos Kostakos
2019 Future generations computer systems  
For this purpose, an innovative model based on Convolutional Neural Network is devised. To this end, we rely on language-related features and social media specific metadata.  ...  More specifically, we introduce two features that have not been previously considered in the literature: the content of URLs and hashtags appearing in tweets.  ...  The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.  ... 
doi:10.1016/j.future.2019.08.018 fatcat:glstyfyt3vbk3ggrdtwo4gvdpu

On the Feasibility of Internet-Scale Author Identification

Arvind Narayanan, Hristo Paskov, Neil Zhenqiang Gong, John Bethencourt, Emil Stefanov, Eui Chul Richard Shin, Dawn Song
2012 2012 IEEE Symposium on Security and Privacy  
We also develop novel techniques for confidence estimation of classifier outputs. Finally, we demonstrate stylometric authorship recognition on texts written in different contexts.  ...  In over 20% of cases, our classifiers can correctly identify an anonymous author given a corpus of texts from 100,000 authors; in about 35% of cases the correct author is one of the top 20 guesses.  ...  We would like to thank the authors of [28] for sharing the notes of their reimplementation of the Writeprints algorithm from [23] and the authors of [46] for sharing their Google profiles dataset  ... 
doi:10.1109/sp.2012.46 dblp:conf/sp/NarayananPGBSSS12 fatcat:3qowkfszvzgkxl3a63ki6hjbpq

Is this hotel review truthful or deceptive? A platform for disinformation detection through computational stylometry

Antonio Pascucci, Raffaele Manna, Ciro Caterino, Vincenzo Masucci, Johanna Monti
2020 International Conference on Language Resources and Evaluation  
In this paper, we present a web service platform for disinformation detection in hotel reviews written in English.  ...  We investigated four different classifiers and we detected that Simple Logistic is the most performing algorithm for this type of classification.  ...  The phenomenon is uncontrollable, especially if we consider that social media and blogs are breeding grounds for news diffusion and that the higher the number of sharing of news, the more people are reached  ... 
dblp:conf/lrec/PascucciMCMM20 fatcat:x6x4lmxmefgjraemwwuix2vccy

Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace? [article]

Tommi Gröndahl, N. Asokan
2019 arXiv   pre-print
Many studies have argued that deceptiveness leaves traces in writing style, which could be detected using text classification techniques.  ...  Textual deception constitutes a major problem for online security.  ...  [123] give two plausible reasons for the commonly observed effectiveness of short character n-grams in comparison to high-level properties like abstract grammatical relations.  ... 
arXiv:1902.08939v2 fatcat:qjbxcq5fpjaubj5z5xii3v44mu

Multi-label emotion classification of Urdu tweets

Noman Ashraf, Lal Khan, Sabur Butt, Hsien-Tsung Chang, Grigori Sidorov, Alexander Gelbukh
2022 PeerJ Computer Science  
The paper highlights the annotation guidelines, dataset characteristics and insights into different methodologies used for Urdu based emotion classification.  ...  A multi-label (ML) classification approach was adopted to detect emotions from Urdu.  ...  "Related Work" explains the related work on multi-label emotion classification datasets and techniques.  ... 
doi:10.7717/peerj-cs.896 pmid:35494831 pmcid:PMC9044368 fatcat:vegch3fbe5devbwamgeonnomi4

CAPS: A Cross-genre Author Profiling System

Ivan Bilan, Desislava Zhekova
2016 Conference and Labs of the Evaluation Forum  
The classification system considers parts-of-speech, collocations, connective words and various other stylometric features to differentiate between the writing styles of male and female authors as well  ...  Further, for age classification, we report accuracy of 44.87% (BPS: 58.97%).  ...  𝑛 . len() is a function which, given a text sample, returns its length either in tokens or in characters, which makes this interpretation suitable for both types of features that work on the level of  ... 
dblp:conf/clef/BilanZ16 fatcat:64v24plrzbbhjj7zqjwjezzqze

Using Bag-of-Words and Psycho-Linguistic features for MAPonSMS

Asmara Safdar, Osama Akhter, Osama Inayat, Abdullah Khalid
2018 Forum for Information Retrieval Evaluation  
The data set 1 was provided as a standard source to work for the multilingual author profiling task in the contest FIRE'18-MAPonSMS 2 .  ...  ) for gender and age prediction.  ...  Feature Extraction For Gender Classification For gender classification task, we used 67 stylometric features in groups of three namely: character based( Table1), vocabulary richness( Table 2 ) and word  ... 
dblp:conf/fire/SafdarAIK18 fatcat:llklelhjofb2nlr53uvbtpwxam

Text Analysis in Adversarial Settings

Tommi Gröndahl, N. Asokan
2019 ACM Computing Surveys  
Many studies have argued that deceptiveness leaves traces in writing style, which could be detected using text classification techniques.  ...  Textual deception constitutes a major problem for online security.  ...  style obfuscation as any method aimed at fooling stylometric classification.  ... 
doi:10.1145/3310331 fatcat:563vjvd63fcdnnswmvmsxthu7e
« Previous Showing results 1 — 15 out of 118 results