Classifying documents with link-based bibliometric measures

T. Couto, N. Ziviani, P. Calado, M. Cristo, M. Gonçalves, E. S. de Moura, W. Brandão
2009 Information retrieval (Boston)  
Automatic document classification can be used to organize documents in a digital library, construct on-line directories, improve the precision of web searching, or help the interactions between user and search engines. In this paper we explore how linkage information inherent to different document collections can be used to enhance the effectiveness of classification algorithms. We have experimented with three linkbased bibliometric measures, co-citation, bibliographic coupling and Amsler, on
more » ... ree different document collections: a digital library of computer science papers, a web directory and an on-line encyclopedia. Results show that both hyperlink and citation .br T. Couto et al. information can be used to learn reliable and effective classifiers based on a kNN classifier. In one of the test collections used, we obtained improvements of up to 69.8% of macro-averaged F 1 over the traditional text-based kNN classifier, considered as the baseline measure in our experiments. We also present alternative ways of combining bibliometric based classifiers with text based classifiers. Finally, we conducted studies to analyze the situation in which the bibliometric-based classifiers failed and show that in such cases it is hard to reach consensus regarding the correct classes, even for human judges.
doi:10.1007/s10791-009-9119-7 fatcat:ycujmpakhjdzvmx3x2s3m43qia