Learning Automated Product Recommendations Without Observable Features: An Initial Investigation
It is appealing to imagine software packages that provide personally tailored product recommendations to a consumer. One way to predict the rating of a particular product by a particular consumer is through inference from a database of previous ratings by many consumers of many products. Such a database consists of triplets of the forms: (product-identifier, consumer-identifier, rating) Generally such databases will be sparse, but nevertheless we may hope to derive considerable predictive
... le predictive information from them. A number of groups have begun developing distributed systems to collect and predict consumer preferences. Some have put significant effort into implementation issues to do with user interfaces, and the gathering and communicating of data via Internet and Usenet. Rather that launching into the development of a distributed system to address a particular consumer preference domain, our goal is to first understand the computational and statistical nature of the general problem. In this paper we develop two algorithms for this purpose and also relate them to a nearest-neighbor based algorithm of [Resnick et al., 1994]. We then examine their predictive performance and quality of recommendations on a number of synthetic and real-world databases. The real-world results suggest that a significant improvement can be obtained over simply recommending the most popular product in some but not all domains. At the end of the paper we discuss computational expense on large databases, the use of explicit features, and our ideas for improved inference algorithms.