Web Information Extraction methods using Web Content Mining (WCM) for Webapplications

Raghavendra R, Dr. Niranjanamurthy M
2022 International Journal of Computing and Digital Systems  
In the digital world era, data was generated by humans and machines are huge in volume and have been accessed through websites on the internet platform. Most of the transactions happened on web product items, web news, and web advertisements. Web Information Extraction (WIE) is the technique where the information on websites is extracted accurately within a time using Web Content Mining (WCM) concept. Every second, new data has been generated in different locations and the contents of the
more » ... es have changed rapidly at various intervals during processing time. The live time and location of the data have changed each time when internet users processing web applications. So extracting the information from the web page or website is a challenging one with accuracy and latency on websites. Classic algorithms and data mining techniques are used to preprocess the generated data with a certain time but the validity of those has not been maintained on the web server. Perhaps, their special features have taken for doing extraction using web mining techniques. The recently advanced concepts such as Deep Learning with Recurrent Neural Networks (RNN) are used to perform Web Information Extraction on various websites over the large network by keeping hold of the data status at each second in memory while doing the processing. The technique Long Short-Term Memory (LSTM) is used to hold the status in intermediate memory then all generated data in web applications send this status to RNN for further classifications. Classification methods are used in Artificial Neural Networks (ANN), it would train the input data from the large network and segregate them based on the algorithms used by the user. Finally, the deep learning concept is combined with all recent trends with input models as an embedded layer. Social media information is up-to-date with its originality and validity also keeps track fully in larger networks by using this technique. This paper suggested the best methods to implement the web information extraction concepts in web content mining from different websites on larger clusters/networks using deep learning LSTM techniques.
doi:10.12785/ijcds/110149 fatcat:cpkcklt2rzh6dhwpd45yktybge