Experiments in Open Domain Deception Detection

Verónica Pérez-Rosas, Rada Mihalcea
2015 Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing  
The widespread use of deception in online sources has motivated the need for methods to automatically profile and identify deceivers. This work explores deception, gender and age detection in short texts using a machine learning approach. First, we collect a new open domain deception dataset also containing demographic data such as gender and age. Second, we extract feature sets including n-grams, shallow and deep syntactic features, semantic features, and syntactic complexity and readability
more » ... y and readability metrics. Third, we build classifiers that aim to predict deception, gender, and age. Our findings show that while deception detection can be performed in short texts even in the absence of a predetermined domain, gender and age prediction in deceptive texts is a challenging task. We further explore the linguistic differences in deceptive content that relate to deceivers gender and age and find evidence that both age and gender play an important role in people's word choices when fabricating lies.
doi:10.18653/v1/d15-1133 dblp:conf/emnlp/Perez-RosasM15 fatcat:irtjxmrjavesjemfsuajri2rum