Text classification stream-based R-measure approach using frequency of substring repetition

Mikhail F. Ashurov, Vasiliy V. Poddubny
2015 Vestnik Tomskogo gosudarstvennogo universiteta Upravlenie vychislitel naya tekhnika i informatika  
Stream-based approach of R-measure using frequency of substring repetition in text classification is offered. Comparative quality analysis of classificators based on the truncated R-measure using frequencies of test text substring repetition and without one is performed on a text set of Russian fiction of the 19 th century and the 90 th of 20 th century. An accuracy of text classification is estimated by Van Rijsbergen's effectiveness measure known as F-measure. The fact that in case of genre
more » ... xing free into author's text classes accounting frequency of test text substring repetition in supertexts increases the classification accuracy is confirmed.
doi:10.17223/19988605/33/1 fatcat:zystzkvgqvcrvnhmps36bzijke