A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is
Large-scale spam emails cause a serious waste of time and resources. This paper investigates the text features of email documents and the feature noises among multi-field texts, resulting in an observation of a power law distribution of feature strings within each text field. According to the observation, we propose an efficient filtering approach including a compound weight method and a lightweight field text classification algorithm. The compound weight method considers both the historicaldoi:10.1080/18756891.2012.696915 fatcat:cfuwvpqlxvawbj7jm3qtpkwomy