A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Camouflaged Chinese Spam Content Detection with Semi-supervised Generative Active Learning
2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
unpublished
We propose a Semi-supervIsed GeNerative Active Learning (SIGNAL) model to address the imbalance, efficiency, and text camouflage problems of Chinese text spam detection task. A "self-diversity" criterion is proposed for measuring the "worthiness" of a candidate for annotation. A semi-supervised variational autoencoder with masked attention learning approach and a character variation graph-enhanced augmentation procedure are proposed for data augmentation. The preliminary experiment demonstrates
doi:10.18653/v1/2020.acl-main.279
fatcat:vd5q67cgmnddrbqg3hk2gofk34