A Best Match KNN-based Approach for Large-scale Product Categorization

Haohao Hu, Runjie Zhu, Yuqi Wang, Wenying Feng, Xing Tan, Jimmy Xiangji Huang
2018 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval  
We use K Nearest Neighbors (KNN) classic classification model and the Best Match (BM)25 probabilistic information retrieval model to assess how efficiently the classic KNN model could be modified to solve the real-life product categorizing problem. This paper gives a system description of the KNN-based algorithm for solving the product classification problem. Our submissions experimented are based on the Rakuten 1M product listings datasets in tsv format provided by the Rakuten Institute of
more » ... nology Boston. The classification of our KNN algorithm was based on the product title similarity scores generated from the BM25 Information Retrieval Model. With the setting of k=3 in KNN, our proposed program achieved 0.7809, 0.7821, 0.7790 in weighted-{precision, recall and F1 score} respectively in the test dataset.
dblp:conf/sigir/HuZWFTH18 fatcat:3sttcxunq5g5neme6y6dxas4mq