Personalized news recommendation via implicit social experts

Chen Lin, Runquan Xie, Xinjun Guan, Lei Li, Tao Li
2014 Information Sciences  
Personalized news recommendation has become a promising research direction as the Internet provides fast access to real-time information around the world. A variety of news recommender systems based on different strategies have been proposed to provide news personalization services for online news readers. However, little research work has been reported on utilizing the implicit "social" factors (i.e., the potential influential experts in news reading community) among news readers to facilitate
more » ... news personalization. In this paper, we investigate the feasibility of integrating content-based methods, collaborative filtering and information diffusion models by employing probabilistic matrix factorization techniques. We propose PRemiSE, a novel Personalized news Recommendation framework via implicit Social Experts, in which the opinions of potential influencers on virtual social networks extracted from implicit feedbacks are treated as auxiliary resources for recommendation. We evaluate and compare our proposed recommendation method with various baselines on a collection of news articles obtained from multiple popular news websites. Experimental results demonstrate the efficacy and effectiveness of our method, particularly, on handling the so-called cold-start problem. binary ratings, where "1" indicates a click on a news story, while "0" indicates a non-click. Therefore, although previous studies are presented in many other domains [7], a couple of critical issues remain unsolved in news recommendations. Firstly, how to overcome the data sparsity problem [7,1]? Many online users read limited news stories compared with the entire repository, and hence the access matrix is very sparse. Collaborative filtering is in particular very sensitive to sparse historical consumptions, since it cannot effectively capture users' access patterns. To address this problem, model-based collaborative filtering (i.e. matrix factorization, probabilistic matrix factorization) [20, 36] is most commonly adopted to reduce dimensions and consequently reduce the level of sparsity. However, how to handle implicit feedback by model-based collaborative filtering is still an open question [15] . The dynamic nature of news stories exacerbates the sparsity problem. Collaborative filtering methods are inefficient to group similar users whose historical consumptions hardly overlap in time. Pure content-based approaches are less likely to be affected by a high level of data sparsity, but they suffer from the overspecialization problem. Hybrid approaches, which alleviate drawback of individual recommendation strategy, provide solutions to data sparsity problem. However, since the content of news stories varies with time, it is not a trivial problem to detect common interest patterns by the aid of content analysis. Secondly, how to deal with the cold-start problem [7,1,38], including the new user and the new item problems? The former is resulted from the fact that online user groups are evolving, whereas the latter is due to the dynamic nature of news stories. Collaborative filtering and the recent trend of social networking approaches (i.e. explore the potential of "word of mouth" in social trust network) [29, 17] are generally not applicable to new users and new items, unless they are pseudo "new", with a few ratings [3] . For new users, many researchers turn to additional information (i.e. collect simple user profiles by requiring new users to fill out a list of questionnaires) for help [31, 39, 44] . However, in questionnaire-based approaches, the process of requiring extra user inputs can be inhospitable and costly. In addition, the quality of user answers cannot be guaranteed. For new items, the content of item is usually helpful [13, 8, 9] . In this work, we study how to incorporate content information, user feedback and social network into a unified model to ameliorate both data sparsity and cold start problems, as well as enhance the overall recommendation efficiency. Intuitively, for sparse news reading data, the key issue is not to quantify the exact value of similarity within correlated user/item pairs, but to discover more correlated candidates from massive disjoint visiting records. Naturally, this could be achieved by two means. On one hand, we can manipulate the sparse access matrix into a denser usage probability matrix. In news reading systems, users prefer something new and timely. Therefore users' visiting records are disjoint because some reasonable clicks are "missing" when the stories are out-of-date. The probability of the "missing" click could be estimated, if proper units (e.g. named entities) are chosen to capture users' long-term interests. On the other hand, probabilistic matrix factorization reveals associations among user feedbacks. For the new item problem, it's more appropriate to recommend the new item to users who are interested in stories with similar content in the past. For new users, the "expert" opinions in the reading community may be good references for selecting stories. However, a list of real "experts" in the social network is usually not obtainable. Modeling the information flow patterns in the virtual social network where people unintentionally influence each other seems to be a good simulation [41] . Inspired by the above mentioned intuitions, we propose PRemiSE, a novel Personalized news Recommendation framework via implicit Social Experts. We explore to predict "missing" clicks based on named entity preferences. Users' news reading interests are supposed to be a combination of several latent factors, which are inferred by estimating the maximal likelihood of observing the "real" and "manipulated" consumption history of each user, together with the content of each news story. In this way, the semantics of news stories and the structure of implicit user feedback are taken into account, leading to better recommendations to experienced users on existing news stories. For new items (stories), the content of stories is projected to latent factors. For new users without any (or enough) ratings and collaborations, stories are chosen based on the most influential users' reading choices on a virtual social network constructed from follow adoption relations among users. In summary, the contribution of our proposed recommendation framework is threefold: Capable of handling the cold-start problem: Collaborative filtering and content-based methods cannot predict ratings for new users due to the lack of rating data. We try to provide reasonable recommendations to new users by leveraging the opinions of implicit "experts", and thus enhance the capability of handling cold-start problem. Semantically interpretable: By simultaneously modeling the observed ratings and word occurrences, our model can reveal the correlations between words and factors. Different from collaborative filtering in which the recommendation result is vague to explain, our approach is able to provide semantically interpretable recommendations. Producing better news recommendations on sparse log data: Our model is specifically designed for sparse user access historical data. We pre-process on the sparse binary log data to produce a denser numerical rating matrix by predicting "missed" user-news access probability. And then by bridging "word of mouth" and "users with similar tastes", better performance on news recommendation is achieved. This paper is an extension of our previous paper [26] . Distinguished contributions of this paper include the following points. (1) The original PRemiSE [26] model works with the sparse binary user-item matrix directly extracted from user log data. In this contribution, we argue that: news story is a natural choice as "item" in the recommendation framework, but it is not the best unit to represent user long-term preference on news events. A matrix manipulation method is presented, the "real" purchase is assigned a high rating, while "missing" usage probability is predicted based on user-entity preferences. (2) Additional empirical study is implemented on large-scale data sets. The observations in the empirical study
doi:10.1016/j.ins.2013.08.034 fatcat:lrx7gvofavhotiohm2xfr4rw2i