Design and Implementation of Search Engine Based on JAVA Technology

2017 2017 International Conference on Computing, Communications and Automation (I3CA 2017)   unpublished
Search engine that the Internet information retrieval system, the use of search engines can search in the network, crawling a lot of information, and its intelligent extraction, quality analysis, indexing, loading index database, and then according to the user's query request in a certain algorithm support from the index data to find information, and finally return to include all the matching keywords of the page. Search engines make a variety of special algorithms involved in the process to
more » ... relate the degree of information to the client in the order from high to low. JAVA technology innovation for the development of search engines has brought new impetus to promote a higher level of development. This paper puts forward the design and implementation of search engine based on JAVA technology. The Composition of the Search Engine Search engine is essentially a class of database, its work mode, including automatic information collection and regular search, such as Google engine will be in a certain period of time using spiders to actively search and found that the new site will extract the relevant information stored in the database, by This shows that the continuous update of the search engine database can continue to expand its scope of application to improve the user's convenience. Specifically, the search engine consists of parsers, indexes and searches, and Web servers. The main function of the analytic program is to parse html, pdf, word, excel and other documents, the document preprocessing process is not only simply read the characters from the file, but also according to its special format to extract the relevant content, the application of the corresponding The open source parsing module gets the text information. In the search engine using Jdbc way to document the title, author, keyword and other attributes written to the database, write before get Nextld method to obtain the ID number to be inserted, and then return to the user with the system structure method, the user will the ID number is transmitted to the Lucene index, which corresponds to the database record. In the access to the page after the temporary storage in the temporary database, then need to establish the index in accordance with the inverted file format to store, in order to improve the efficiency of query information. The user enters the search condition in the search program, which retrieves through the index database, and then classifies the search results according to certain criteria and returns them to the user. Users through the browser query information, Web server connection index database and the user input query conditions, Web server to receive the user's query conditions in the index database query, sort, and then return to the user to complete the search. Search engine workflow includes four links, first in the network to crawl the web page, the establishment of index database, in the index database to retrieve information and finally the search results are processed and sorted, and feedback to the client. The Advantages of JAVA Technology Compared with other assembly language, JAVA advantage is mainly reflected in the following aspects: First, security. In the network environment, the security of JAVA technology is of great significance, and its security mechanism can effectively attack the malicious code, and ensure the security of the information to the greatest extent. Second, it is the mandatory. JAVA technology object-oriented process generally only supports one-way inheritance between classes, so to carry out multiple must have multiple interfaces, so JAVA has a mandatory feature. Again, it is the
doi:10.25236/i3ca.2017.15 fatcat:mwgjexj7h5gqpfmvkb2grkfy6i