Internet Categorization and Search: A Self-Organizing Approach

Hsinchun Chen, Chris Schuffels, Richard Orwig
1996 Journal of Visual Communication and Image Representation  
that is used by searchers of varying backgrounds a more intelligent and proactive search aid is needed. The problems of information overload and vocabulary differences have become more pressing with the emergence of increas- The problems of information overload and vocabulary ingly popular Internet services. The main information retrieval differences have become more pressing with the emergence mechanisms provided by the prevailing Internet WWW softof increasingly popular Internet services [47,
more » ... 24]. Although ware are based on either keyword search (e.g., the Lycos server Internet protocols such as WWW/http support significantly at CMU, the Yahoo server at Stanford) or hypertext browsing easier importation and fetching of online information (e.g., Mosaic and Netscape). This research aims to provide an sources, their use is accompanied by the problem of users alternative concept-based categorization and search capability not being able to explore and find what they want in an for WWW servers based on selected machine learning algoenormous information space [2, 6, 55] . While the Internet rithms. Our proposed approach, which is grounded on autoservices are popular and appealing to many online users, matic textual analysis of Internet documents (homepages), atdifficulties with search on Internet, we believe, will worsen tempts to address the Internet search problem by first categorizing the content of Internet documents. We report re-as the amount of online information increases. We consider sults of our recent testing of a multilayered neural network that devising a scalable approach to Internet search is criticlustering algorithm employing the Kohonen self-organizing cal to the success of Internet services and other current feature map to categorize (classify) Internet homepages acand future national information infrastructure applicacording to their content. The category hierarchies created could tions. serve to partition the vast Internet services into subject-specific The main information retrieval mechanisms provided by categories and databases and improve Internet keyword searchthe prevailing Internet WWW-based software are based ing and/or browsing.
doi:10.1006/jvci.1996.0008 fatcat:dlisw7bpercglkq6w3llzabdii