Automatically infer subject terms and documents associations through text mining

Kun Lu, Jin Mao
2013 Proceedings of the American Society for Information Science and Technology  
Subject indexing is an intellectual intensive process that bears many inherent uncertainties. Existing subject index systems generally produce binary outcomes on whether assigning an indexing term or not, which does not sufficiently reflect to which extent the indexing terms are associated with documents. On the other hand, probabilistic models have seen great success in capturing the uncertainties in the automatic indexing process. One hurdle to achieving weighted indexing in manual subject
more » ... exing process is the practical burden that could be added to the already intensive indexing process. In this study, we propose a method to automatically infer the associations between subject terms and documents through text mining. By uncovering the connections between MeSH terms and document text, we are able to derive the weights of MeSH terms in documents. Our initial results suggest that the new method is feasible and promising. The study has practical implications for improving subject indexing practice.
doi:10.1002/meet.14505001133 fatcat:xlcwpnrklrev3lm6bybcwgwdea