How does high dimensionality affect collaborative filtering?

Alexandros Nanopoulos, Miloš Radovanović, Mirjana Ivanović
2009 Proceedings of the third ACM conference on Recommender systems - RecSys '09  
A crucial operation in memory-based collaborative filtering (CF) is determining nearest neighbors (NNs) of users/items. This paper addresses two phenomena that emerge when CF algorithms perform NN search in high-dimensional spaces that are typical in CF applications. The first is similarity concentration and the second is the appearance of hubs (i.e. points which appear in k-NN lists of many other points). Through theoretical analysis and experimental evaluation we show that these phenomena are
more » ... inherent properties of high-dimensional space, unrelated to other data properties like sparsity, and that they can impact CF algorithms by questioning the meaning and representativeness of discovered NNs. Moreover, we show that it is not easy to mitigate the phenomena using dimensionality reduction. Studying these phenomena aims to provide a better understanding of the limitations of memory-based CF and motivate the development of new algorithms that would overcome them.
doi:10.1145/1639714.1639771 dblp:conf/recsys/NanopoulosRI09 fatcat:j6o54f4fvje33dqagrqn54jrzq