Towards information system development for data extraction from web

Yulia Mukolaivna Gontar, Kateryna Victorivna Tkach, Bohdan Oleksandrovych Yena, Artem Victorovych Vasylenko
2018 Bulletin of National Technical University KhPI Series System Analysis Control and Information Technologies  
Today, the Internet contains a huge number of sources of information, which is constantly used in our daily lives. It often happens that similar in meaning information is presented in different forms on different resources (for example, electronic libraries, online stores, news sites and etc.). In this paper, we analyze the extraction of information from certain type of web sources that is required by the user. The analysis of the data extraction problem was carried out. When considering the
more » ... n approaches to data extraction, the strengths and weaknesses of each were identified. The main aspects of the extraction of web knowledge were formulated. Approaches and information technologies for solving problems of syntactic analysis based on existing information systems are analyzed. Based on the analysis, the task of developing models and software components for extracting data from certain types of web resources were solving. A conceptual model of extracting data was developed taking into account web space as an external data source. A requirements specification for the software component was created, which will allow to continue working on the project and to clearly understand the requirements and constraints for implementation. During the process of modeling software, the following diagrams have been developed, such as activities, sequences and deployments, which will then be used to create the finished software application. For further development of the software, a programming platform and types of testing (load and modular) were defined. The obtained results allow to state that the proposed design solution, which will be implemented as a prototype of the software system, can perform the task of extracting data from different sources on the basis of a single semantic template.
doi:10.20998/2079-0023.2018.22.08 fatcat:sqmu2j6lh5cyfdmzcrgtzq4dwm