Challenges in automated deception detection in computer-mediated communication
Proceedings of the American Society for Information Science and Technology
Deception detection remains novel, challenging, and important in natural language processing, machine learning, and the broader LIS community. Computational tools capable of alerting users to potentially deceptive content in computer-mediated messages are invaluable for supporting undisrupted, computer-mediated communication, information seeking, credibility assessment and decision making. The goal of this ongoing research is to inform creation of such automated capabilities. In this study we
... icit a sample of 90 computer-mediated personal stories with varying levels of deception. Each story has 10 associated human judgments, confidence scores, and explanations. In total, 990 unique respondents participated in the study. Three analytical approaches are applied: human judgment accuracy, linguistic cue detection, and machine learning. Comparable to previous research results, human judges achieve 50-63% success rates. Actual deception levels negatively correlate with their confident judgments as being deceptive (r=-0.35, df=88, p=0.008). The best-performing machine learning algorithms reach 65% accuracy. Linguistic cues are extracted, calculated, and modeled with logistic regression, but are found not to be significant predictors of deception level or confidence score. We address the associated challenges with error analysis of the respondents' stories, and prose a faceted deception classification (theme, centrality, realism, essence, distancing) as well as a typology for stated perceived cues for deception detection (world knowledge, logical contradiction, linguistic evidence, and intuitive sense).