An Improved Optimal Method for Classification Problem

Huang Wei, Dong Xiao, Shang Wenqian, Lin Weiguo, Yan Menghan
2019 International Journal of Performability Engineering  
In order to better mine and analyze the massive data generated by search engine companies, this paper proposes a search traffic classification and dimension reduction method based on a logistic regression algorithm. Combined with distributed Hadoop technology, a text classification model is designed and implemented by data research, data analysis, and contrast experiments. In the process of feature extraction of word units, the feature combination method is used, and auxiliary information such
more » ... s URL is introduced as a semaphore and optimized for the problem of low quality of training samples. The experimental results show that the model optimization effectively improves the quality of the training set. The addition of auxiliary information to train the training set can solve the under-fitting to a certain extent and improve the classification effect. The accuracy of the search traffic classification method and other indicators can reach an artificially accepted range.
doi:10.23940/ijpe.19.11.p23.30313041 fatcat:rxsmmcrgbfac3krzojasf7d7be