Information agent technology for the Internet: A survey
Data & Knowledge Engineering
The vast amount of heterogeneous information sources available in the Internet demands advanced solutions for acquiring, mediating, and maintaining relevant information for the common user. Intelligent information agents are autonomous computational software entities that are especially meant for (1) to provide a proactive resource discovery, (2) to resolve information impedance of information consumers and providers, and (3) to offer value-added information services and products. These agents
... re supposed to cope with the difficulties associated with the information overload of the user preferably just in time. Based on a systematic classification of intelligent information agents this paper presents an overview of basic key enabling technologies needed to build such agents, and respective examples of information agent systems currently deployed on the Internet. Keywords: Intelligent agents, semantic information brokering, personal assistants, cooperative information systems, agent-mediated electronic business ____________________________________________________________________ Other challenges include, for example, how to cope with the problems of the crosssocial, cultural and multi-lingual cyberspace. Regarding efficiency in time, the impact of ongoing efforts to increase the transfer rates in the Internet-2 (Next Generation Internet) to 9.6 Gbit/s on the situation for the common user of the public Internet in the next couple of years remains to be unclear. Information agent technology  (IAT) emerged as a major part of the more general intelligent software agent technology [144, 154] around seven years ago as a response to the challenges mentioned above, from both, the technological and human user perspective. As such IAT is an inherently interdisciplinary technology encompassing approaches, methods and tools from different research disciplines and sub-fields such as artificial intelligence, advanced databases and knowledge base systems, distributed information systems, information retrieval, and human computer interaction. The driving idea of IAT is the development and effective, efficient utilization of autonomous computational software entities, called intelligent information agents, which have access to multiple, heterogeneous and geographically distributed information sources as in the Internet or corporate Intranets. The main task of such agents is to perform pro-active searches for, maintain, and mediate relevant information on behalf of their users or other agents. This includes skills such as retrieving, analyzing, manipulating, fusing heterogeneous information as well as visualizing of and guiding the user through the available, individual information space. Web Indices and Search Bots Most prominent solutions for finding relevant information in the Internet include monolithic Web indices such as Gopher and Harvest  as well as search engines and (meta-) search bots . Search bots like AltaVista, Lycos, InfoSeek, Excite or Hotbot use basic information retrieval techniques [141, 81, 49] to automatically gather information from indexed Web pages, maintain and periodically update their own index database, and provide a rating-based one time query answering mechanism to the user. Each bot has a proprietary method for recursively traversing hyperlinks starting from a given initial list of URLs, and ranking retrieved documents. The information quality of the result does not only rely on the ontological organization, size, and methods of access to the internal index but also on the expressiveness of the query language the user is enforced to use to formulate inquiries to the bot. Among others, the main limitations of search bots are that they do not behave proactively due to their one-shot answering mechanism providing a rather simple query language in terms of regular expressions of phrases and keywords. Each search bot has its own idiosyncratic way the user has to deal with, and finally, most of the prominent search bots offer a maximum of coverage of just about 30% of the Web, or less, including up to 5% invalid or broken links [76, 190] . Meta-search bots such as MetaCrawler, SavvySearch, Ahoy!, Remora or WebMate execute a given query concurrently over a variety of given search bots, merge and present the results in a homogeneous, ranking-based view to the user. That allows the user to enlarge the individual search space and may increase the hit rate for some queries. According to  search bots like Excite, HotBot, and Lycos use certain page importance metrics for ranking retrieved Web pages. These include the • backlink count measuring the number of links (in-links) to a page p that appear over the entire Web. This implies that the more pages link to p, the greater p's importance, thereby treating all links equally, and pushing out equally important, small fields by sheer volume of links. • page-rank backlink metric measuring recursively the weighted sum of the inlinks to a page p, thereby exaggerating the above problem in that the more pages link to p having themselves a high backlink count, the greater p's importance. According to the definition and classification of information agents we can differentiate between communication, knowledge, collaboration, and rather lowlevel task skills as depicted in figure 1. In this figure, the corresponding key enabling technologies are listed below each of the different types of skills. Communication skills of an information agent comprehend either communication with information systems and databases, human users, or other agents. In the latter case, the use of an agent communication language has to be considered on top of, for example, middleware platforms or specific APIs.