Towards Practical PPM Spam Filtering: Experiments for the TREC 2006 Spam Track

Andrej Bratko, Bogdan Filipic, Blaz Zupan
2006 Text Retrieval Conference  
This paper summarizes our participation in the TREC 2006 spam track. We submitted a single filter for the evaluation, based on the Prediction by Partial Matching compression scheme, a method that performed well in the previous TREC evaluation. A major focus of our effort was to improve efficiency of the method, particularly in terms of memory consumption, in order to establish whether compressionbased filters are in fact a viable solution for practical applications. Our system exhibited fair
more » ... formance, despite the fact that the filtering techniques remained virtually unchanged from the previous evaluation. We did not investigate methods for tackling delayed user feedback. A very simple strategy of training on most recent examples was used for the active learning task, and found to work surprisingly well given its simplicity.
dblp:conf/trec/BratkoFZ06 fatcat:ycdkaxro3jbinlt5vnmcuxu4xq