Intelligent Search for Image Information on the Web through Text and Link Structure Analysis [chapter]

Euripides G.M. Petrakis
2008 Multimodal Processing and Interaction  
Searching for effective methods to retrieve information from the World Wide Web (WWW) has been in the center of many research efforts during the last few years. The relevant technology evolved rapidly thanks to advances in Web systems technology [1] and information retrieval research [15] . Image retrieval on the Web, in particular, is a very important problem in itself [8] . The relevant technology has also evolved significantly propelled by advances in image database research [20] . Several
more » ... proaches to the problem of content-based image retrieval on the Web have been proposed and some have been implemented on research prototypes (e.g., ImageRover [23],WebSEEK [21] ) and commercial systems. The last category of systems, includes general purpose image search engines (e.g., Google Image Search 1 , Yahoo 2 , Altavista 3 ) as well as systems providing specific services to users such as detection of unauthorized use of images, Web and e-mail content filters, image authentication, licensing and advertising. Image retrieval on the Web requires that content descriptions be extracted from Web pages and used to determine which Web pages contain images that satisfy the query selection criteria. The methods and systems referred to above differ in the type of content descriptions used and in the search methods applied. There are four main approaches to Web image search and retrieval. Retrieval by text content: Typically images on the Web are described by text or attributes associated with images in html tags (e.g., filename, caption, alternate text etc.). These are automatically extracted from the Web pages and are used in retrievals. Google, Yahoo, and AltaVista are example systems of this category. The importance of the various text fields in retrieving images by text content depends also on their relative location with regard to the location of the images within the Web pages [19] . 1
