Semantically driven snippet selection for supporting focused web searches

Iraklis Varlamis, Sofia Stamou
2009 Data & Knowledge Engineering  
Millions of people access the plentiful web content to locate information that is of interest to them. Searching is the primary web access method for many users. During search, the users visit a web search engine and use an interface to specify a query (typically comprising a few keywords) that best describes their information need. Upon query issuing, the engine's retrieval modules identify a set of potentially relevant pages in the engine's index, and return them to the users, ordered in a
more » ... that reflects the pages' relevance to the query keywords. Currently, all major search engines display search results as a ranked list of URLs (pointing to the relevant pages' physical location on the web) accompanied by the returned pages' titles and small text fragments that summarize the context of search keywords. Such text fragments are widely known as snippets and they serve towards offering a glimpse to the returned pages' contents. In general, text snippets, extracted from the retrieved pages, are an indicator of the pages' usefulness to the query intention and they help the users browse search results and decide on the pages to visit. Thus far, the extraction of text snippets from the returned pages' contents relies on statistical methods in order to determine which text fragments contain most of the query keywords. Typically, the first two text nuggets in the page's contents that contain the query keywords are merged together to produce the final snippet that accompanies the page's title and URL in the search results. Unfortunately, statistically-generated snippets are not always representative of the pages' contents and they are not always closely related to the query intention. Such text snippets might mislead web users in visiting pages of little interest or usefulness to them. In this article, we propose a snippet selection technique, which identifies within the contents of the query relevant pages those text fragments that are both highly relevant to the query intention and expressive of the pages' entire contents. The motive for our work is to assist web users make informed decisions before clicking on a page in the list of search results. Towards this goal, we firstly show how to analyze search results in order to decipher the query intention. Then, we process the content of the query matching pages in order to identify text fragments that highly correlate to the query semantics. Finally, we evaluate the query-related text fragments in terms of coherence and expressiveness and pick from every retrieved page the text nugget that highly correlates to the query intention and is also very representative of the page's content. A thorough evaluation over a large number of web pages and queries suggests that the proposed snippet selection technique extracts good quality text snippets with high precision and recall that are superior to existing snippet selection methods. Our study also reveals that the snippets delivered by our method can help web users decide on which results to click. Overall, our study suggests that semantically-driven snippet selection can be used to augment traditional snippet extraction approaches that are mainly dependent upon the statistical properties of words within a text.
doi:10.1016/j.datak.2008.10.002 fatcat:7kvqvexu7rb7rhtdv6fsehtiwy