Article De-duplication Using Distributed Representations

Shumpei Okura, Yukihiro Tagami, Akira Tajima
2016 Proceedings of the 25th International Conference Companion on World Wide Web - WWW '16 Companion  
In news recommendation systems, eliminating redundant information is important as well as providing interesting articles for users. We propose a method that quantifies the similarity of articles based on their distributed representation, learned with the category information as weak supervision. This method is useful for evaluation under tight time constraints, since it only requires low-dimensional inner product calculation for estimating similarities. The experimental results from human
more » ... ts from human evaluation and online performance in A/B testing suggest the effectiveness of our proposed method, especially for quantifying middle-level similarities. Currently, this method is used on Yahoo! JAPAN's front page, which has millions of users per day and billions of page views per month.
doi:10.1145/2872518.2889355 dblp:conf/www/OkuraTT16 fatcat:kjmmtldi4zg2nglpxrdeucaezi