Enhancing information retrieval through concept-based language modeling and semantic smoothing

Lynda Said Lhadj, Mohand Boughanem, Karima Amrouche
2015 Journal of the Association for Information Science and Technology  
Traditionally, many information retrieval models assume that terms occur in documents independently. Although these models have already shown good performance, the word independency assumption seems to be unrealistic from a natural language point of view, which considers that terms are related to each other. Therefore, such an assumption leads to two well-known problems in information retrieval (IR), namely, polysemy, or term mismatch, and synonymy. In language models, these issues have been
more » ... ressed by considering dependencies such as bigrams, phrasal-concepts, or word relationships, but such models are estimated using simple n-grams or concept counting. In this paper, we address polysemy and synonymy mismatch with a conceptbased language modeling approach that combines ontological concepts from external resources with frequently found collocations from the document collection. In addition, the concept-based model is enriched with subconcepts and semantic relationships through a semantic smoothing technique so as to perform semantic matching. Experiments carried out on TREC collections show that our model achieves significant improvements over a single word-based model and the Markov Random Field model (using a Markov classifier).
doi:10.1002/asi.23553 fatcat:tavmbsf5yjbhxk7dcu3ld54hw4