Filters








10 Hits in 0.44 sec

personality-detection-using-bagged-svm-over-bert.pdf [article]

Amirmohammad Kazemeini, Samin Fatehi, Yash Mehta, Sauleh Eetemadi, Erik Cambria
2020 figshare.com  
Recently, the automatic prediction of personality traits has received increasing attention and has emerged as a hot topic within the field of affective computing. In this work, we present a novel deep learning-based approach for automated personality detection from text. We leverage state of the art advances in natural language understanding, namely the BERT language model to extract contextualized word embeddings from textual data for automated author personality detection. Our primary goal is
more » ... to develop a computationally efficient, high-performance personality prediction model which can be easily used by a large number of people without access to huge computation resources. Our extensive experiments with this ideology in mind, led us to develop a novel model which feeds contextualized embeddings along with psycholinguistic features toa Bagged-SVM classifier for personality trait prediction. Our model outperforms the previous state of the art by 1.04% and, at the same time is significantly more computationally efficient to train. We report our results on the famous gold standard Essays dataset for personality detection.
doi:10.6084/m9.figshare.13012421.v1 fatcat:yzlcakzmgzhotjhfsbl34zarwy

Dramatically Reducing Training Data Size Through Vocabulary Saturation

William Lewis, Sauleh Eetemadi
2013 Conference on Machine Translation  
Our field has seen significant improvements in the quality of machine translation systems over the past several years. The single biggest factor in this improvement has been the accumulation of ever larger stores of data. However, we now find ourselves the victims of our own success, in that it has become increasingly difficult to train on such large sets of data, due to limitations in memory, processing power, and ultimately, speed (i.e., data to models takes an inordinate amount of time).
more » ... teams have dealt with this by focusing on data cleaning to arrive at smaller data sets (Denkowski et al., 2012a; Rarrick et al., 2011) , "domain adaptation" to arrive at data more suited to the task at hand (Moore and Lewis, 2010; Axelrod et al., 2011) , or by specifically focusing on data reduction by keeping only as much data as is needed for building models e.g., (Eck et al., 2005) . This paper focuses on techniques related to the latter efforts. We have developed a very simple n-gram counting method that reduces the size of data sets dramatically, as much as 90%, and is applicable independent of specific dev and test data. At the same time it reduces model sizes, improves training times, and, because it attempts to preserve contexts for all n-grams in a corpus, the cost in quality is minimal (as measured by BLEU ). Further, unlike other methods created specifically for data reduction that have similar effects on the data, our method scales to very large data, up to tens to hundreds of millions of parallel sentences.
dblp:conf/wmt/LewisE13 fatcat:oiopc5tslrelron4hydj2bfnjy

Personality Trait Detection Using Bagged SVM over BERT Word Embedding Ensembles [article]

Amirmohammad Kazameini, Samin Fatehi, Yash Mehta, Sauleh Eetemadi, Erik Cambria
2020 arXiv   pre-print
Recently, the automatic prediction of personality traits has received increasing attention and has emerged as a hot topic within the field of affective computing. In this work, we present a novel deep learning-based approach for automated personality detection from text. We leverage state of the art advances in natural language understanding, namely the BERT language model to extract contextualized word embeddings from textual data for automated author personality detection. Our primary goal is
more » ... to develop a computationally efficient, high-performance personality prediction model which can be easily used by a large number of people without access to huge computation resources. Our extensive experiments with this ideology in mind, led us to develop a novel model which feeds contextualized embeddings along with psycholinguistic features toa Bagged-SVM classifier for personality trait prediction. Our model outperforms the previous state of the art by 1.04% and, at the same time is significantly more computationally efficient to train. We report our results on the famous gold standard Essays dataset for personality detection.
arXiv:2010.01309v1 fatcat:6bq5jaea6jc2fhij6cluwawzqu

Survey of data-selection methods in statistical machine translation

Sauleh Eetemadi, William Lewis, Kristina Toutanova, Hayder Radha
2015 Machine Translation  
Sauleh Eetemadi et al. A set function is modular if and only if the value of the function over a set equals the sum of the function value over its individual elements.  ...  There is too much parallel data to train and iterate on in a timely manner (Lewis and Eetemadi (2013) ). 3.  ... 
doi:10.1007/s10590-015-9176-1 fatcat:e3nogrn42bbwtj4hfcurt72wme

Asymmetric Features Of Human Generated Translation

Sauleh Eetemadi, Kristina Toutanova
2014 Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)  
Distinct properties of translated text have been the subject of research in linguistics for many year (Baker, 1993) . In recent years computational methods have been developed to empirically verify the linguistic theories about translated text (Baroni and Bernardini, 2006) . While many characteristics of translated text are more apparent in comparison to the original text, most of the prior research has focused on monolingual features of translated and original text. The contribution of this
more » ... k is introducing bilingual features that are capable of explaining differences in translation direction using localized linguistic phenomena at the phrase or sentence level, rather than using monolingual statistics at the document level. We show that these bilingual features outperform the monolingual features used in prior work (Kurokawa et al., 2009) for the task of classifying translation direction.
doi:10.3115/v1/d14-1018 dblp:conf/emnlp/EetemadiT14 fatcat:nnt64sndincahgboqh2bzrqnum

Detecting Translation Direction: A Cross-Domain Study

Sauleh Eetemadi, Kristina Toutanova
2015 Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop  
Another feature we have chosen is from the work of Eetemadi and Toutanova (2014) where they achieve higher accuracy by introducing POS MTU 3 n-gram features.  ...  Prior works claim POS n-gram features capture linguistic phenomena of translation and should generalize across domains (Kurokawa et al., 2009; Eetemadi and Toutanova, 2014) .  ... 
doi:10.3115/v1/n15-2014 dblp:conf/naacl/EetemadiT15 fatcat:psmfwbh26rb4vilhxelphrpbqu

Pars-ABSA: an Aspect-based Sentiment Analysis dataset for Persian [article]

Taha Shangipour Ataei, Kamyar Darvishi, Soroush Javdan, Behrouz Minaei-Bidgoli, Sauleh Eetemadi
2019 arXiv   pre-print
Due to the increased availability of online reviews, sentiment analysis had been witnessed a booming interest from the researchers. Sentiment analysis is a computational treatment of sentiment used to extract and understand the opinions of authors. While many systems were built to predict the sentiment of a document or a sentence, many others provide the necessary detail on various aspects of the entity (i.e. aspect-based sentiment analysis). Most of the available data resources were tailored
more » ... English and the other popular European languages. Although Persian is a language with more than 110 million speakers, to the best of our knowledge, there is a lack of public dataset on aspect-based sentiment analysis for Persian. This paper provides a manually annotated Persian dataset, Pars-ABSA, which is verified by 3 native Persian speakers. The dataset consists of 5,114 positive, 3,061 negative and 1,827 neutral data samples from 5,602 unique reviews. Moreover, as a baseline, this paper reports the performance of some state-of-the-art aspect-based sentiment analysis methods with a focus on deep learning, on Pars-ABSA. The obtained results are impressive compared to similar English state-of-the-art.
arXiv:1908.01815v3 fatcat:smkftrskmrfz7oxu23tk3y5vzm

Bottom-Up and Top-Down: Predicting Personality with Psycholinguistic and Language Model Features

Yash Mehta, Samin Fatehi, Amirmohammad Kazameini, Clemens Stachl, Erik Cambria, Sauleh Eetemadi
2020 2020 IEEE International Conference on Data Mining (ICDM)  
State-of-the-art personality prediction with text data mostly relies on bottom up, automated feature generation as part of the deep learning process. More traditional models rely on hand-crafted, theory-based text-feature categories. We propose a novel deep learning-based model which integrates traditional psycholinguistic features with language model embeddings to predict personality from the Essays dataset for Big-Five and Kaggle dataset for MBTI. With this approach we achieve stateof-the-art
more » ... model performance. Additionally, we use interpretable machine learning to visualize and quantify the impact of various language features in the respective personality prediction models. We conclude with a discussion on the potential this work has for computational modeling and psychological science alike. 1
doi:10.1109/icdm50108.2020.00146 fatcat:6b37ki6r7rhazbh36kciiq2vry

Unsupervised Domain Clusters in Pretrained Language Models [article]

Roee Aharoni, Yoav Goldberg
2020 arXiv   pre-print
Sauleh Eetemadi, William Lewis, Kristina Toutanova, and Hayder Radha. 2015. Survey of data-selection . Split and rephrase: Better evaluation and stronger baselines.  ...  For more related work on data selection and domain adaptation in the context of MT, see the surveys by Eetemadi et al. (2015) for SMT and more recently Chu and Wang (2018) for NMT.  ... 
arXiv:2004.02105v2 fatcat:2n7zd3s3qfcr5gp2ra4rwvvogm

Joint Language and Translation Modeling with Recurrent Neural Networks

Michael Auli, Michel Galley, Chris Quirk, Geoffrey Zweig
2013 Conference on Empirical Methods in Natural Language Processing  
Acknowledgments We would like to thank Anthony Aue, Hany Hassan Awadalla, Jon Clark, Li Deng, Sauleh Eetemadi, Jianfeng Gao, Qin Gao, Xiaodong He, Will Lewis, Arul Menezes, and Kristina Toutanova for helpful  ... 
dblp:conf/emnlp/AuliGQZ13 fatcat:f2tyficjazg67mec5yxdtlq4ha