Filters








113,284 Hits in 6.8 sec

On Sampled Metrics for Item Recommendation

Walid Krichene, Steffen Rendle
2020 Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining  
Item recommendation algorithms are evaluated using ranking metrics that depend on the positions of relevant items.  ...  Moreover, the smaller the sampling size, the less difference there is between metrics, and for very small sampling size, all metrics collapse to the AUC metric.  ...  ACKNOWLEDGEMENTS We would like to thank Nicolas Mayoraz and Li Zhang for their helpful comments and suggestions.  ... 
doi:10.1145/3394486.3403226 fatcat:ib3iveavlzdcxjkijdwpf5uere

On sampled metrics for item recommendation

Walid Krichene, Steffen Rendle
2022 Communications of the ACM  
Item recommendation algorithms are evaluated by metrics that compare the positions of truly relevant items among the recommended items.  ...  Moreover, the smaller the sample size, the less difference there is between metrics, and for very small sample size, all metrics collapse to the AUC metric.  ...  Acknowledgment We would like to thank Nicolas Mayoraz and Li Zhang for their helpful comments and suggestions.  ... 
doi:10.1145/3535335 fatcat:lkpyhbzdofg4hnpxbdyitzzsbi

On Sampled Metrics for Item Recommendation (Extended Abstract)

Walid Krichene, Steffen Rendle
2021 Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence   unpublished
Item recommendation algorithms are evaluated by metrics that compare the positions of truly relevant items among the recommended items.  ...  Our work suggests that sampling should be avoided for metric calculation, however if an experimental study needs to sample, the proposed corrections can improve the estimates.  ...  Acknowledgements We would like to thank Nicolas Mayoraz and Li Zhang for their helpful comments and suggestions.  ... 
doi:10.24963/ijcai.2021/651 fatcat:vhwsjjvecfcvxbled2hghv6bgm

Evaluation Metrics for Item Recommendation under Sampling [article]

Steffen Rendle
2019 arXiv   pre-print
Item recommendation algorithms are evaluated using ranking metrics that depend on the positions of relevant items.  ...  Moreover the smaller the sampling size, the less difference between metrics, and for very small sampling size, all metrics collapse to the AUC metric.  ...  Acknowledgements I would like to thank Walid Krichene, Nicolas Mayoraz and Li Zhang for their helpful comments and suggestion.  ... 
arXiv:1912.02263v1 fatcat:m4obrsf6xnfjpezpfj7p3qchhi

Revisiting Alternative Experimental Settings for Evaluating Top-N Item Recommendation Algorithms [article]

Wayne Xin Zhao, Junhua Chen, Pengfei Wang, Qi Gu, Ji-Rong Wen
2020 arXiv   pre-print
In this paper, we revisit alternative experimental settings for evaluating top-N recommendation algorithms, considering three important factors, namely dataset splitting, sampled metrics and domain selection  ...  By carefully revisiting different options, we make several important findings on the three factors, which directly provide useful suggestions on how to appropriately set up the experiments for top-N item  ...  Analysis on Sampled Metrics Next, we study the effect of sampled metrics (i.e., only a smaller set of sampled items and the ground-truth items are ranked for computing the metrics) on performance ranking  ... 
arXiv:2010.04484v1 fatcat:y4pnzgkucff4hl3paujpmbvr24

A Revisiting Study of Appropriate Offline Evaluation for Top- N Recommendation Algorithms

Wayne Xin Zhao, Zihan Lin, Zhichao Feng, Pengfei Wang, Ji-Rong Wen
2022 ACM Transactions on Information Systems  
Based on the large-scale experiments and detailed analysis, we derive several key findings on the experimental settings for evaluating recommender systems.  ...  This work presents a large-scale, systematic study on six important factors from three aspects for evaluating recommender systems.  ...  ACKNOWLEDGMENTS The authors gratefully appreciate the anonymous reviewers for their valuable and detailed comments that greatly helped to improve the quality of this article.  ... 
doi:10.1145/3545796 fatcat:lwn5xjw7jvaunapyllu7rqjnfi

Connecting User and Item Perspectives in Popularity Debiasing for Collaborative Recommendation [article]

Ludovico Boratto, Gianni Fenu, Mirko Marras
2020 arXiv   pre-print
The first one encourages equal probability of being recommended across items, while the second one encourages true positive rates for items to be equal.  ...  Then, we characterize the recommendations of representative algorithms with respect to the proposed metrics, and we show that the item probability of being recommended and the item true positive rate are  ...  We seek to address this gap with two bias metrics tailored for individual items: the first one enforces ranking probabilities for items to be the same, and the second one encourages true positive rates  ... 
arXiv:2006.04275v1 fatcat:mhkobdmtkvbehmyrui7u4gmjpi

Estimation of Fair Ranking Metrics with Incomplete Judgments [article]

Ömer Kırnap, Fernando Diaz, Asia Biega, Michael Ekstrand, Ben Carterette, Emine Yılmaz
2021 arXiv   pre-print
In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics.  ...  To date, these metrics typically assume the availability and completeness of protected attribute labels of items.  ...  For the Book Recommendation dataset, we assume that all recommended items are relevant and sample from the gender attribute. Comparison of Estimated vs.  ... 
arXiv:2108.05152v1 fatcat:yr6idob6lvdvpk2ptpvp3fx6ie

Estimation of Fair Ranking Metrics with Incomplete Judgments

Ömer Kırnap, Fernando Diaz, Asia Biega, Michael Ekstrand, Ben Carterette, Emine Yilmaz
2021 Proceedings of the Web Conference 2021  
In order to address this problem, we propose a sampling strategy and estimation technique for four fair ranking metrics.  ...  To date, these metrics typically assume the availability and completeness of protected attribute labels of items.  ...  For the Book Recommendation dataset, we assume that all recommended items are relevant and sample from the gender attribute. Comparison of Estimated vs.  ... 
doi:10.1145/3442381.3450080 fatcat:w53apbxggne6pgu2o4roll5rt4

On Estimating Recommendation Evaluation Metrics under Sampling [article]

Ruoming Jin and Dong Li and Benjamin Mudrak and Jing Gao and Zhi Liu
2021 arXiv   pre-print
Since the recent study (Krichene and Rendle 2020) done by Krichene and Rendle on the sampling-based top-k evaluation metric for recommendation, there has been a lot of debates on the validity of using  ...  still a lack of understanding and consensus on how sampling should be used for recommendation evaluation.  ...  for recommendation evaluation metrics.  ... 
arXiv:2103.01474v2 fatcat:4762s4vh2nclziyga3n5vh6zma

Differentiable Ranking Metric Using Relaxed Sorting For Top-K Recommendation

Hyunsung Lee, Sangwoo Cho, Yeongjae Jang, Jaekwang Kim, Honguk Woo
2021 IEEE Access  
Most recommenders generate recommendations for a user by computing the preference score of items, sorting the items according to the score, and filtering top-K-items of high scores.  ...  As a result, inconsistency occurs between existing learning objectives and ranking metrics of recommenders.  ...  Negative sampling draws n neg items along with one positive item, as normally employed for learning recommendation models [4] , [6] , [24] , [25] .  ... 
doi:10.1109/access.2021.3105389 fatcat:t7o53enkdjajdgx2m6jbacxbsu

International application of PROMIS® computerized adaptive tests: US versus country-specific item parameters can be consequential for individual patient scores

Caroline B. Terwee, Martine H.P. Crins, Leo D. Roorda, Karon F. CooK, David Cella, Niels Smits, Benjamin D. Schalet
2021 Journal of Clinical Epidemiology  
item parameters for all items (rescaled to the US metric).  ...  We recommend more studies of translated CATs to examine if strategies that allow for country-specific item parameters should be further investigated.  ...  This is currently recommended for PROMIS measures in case of substantial DIF. 4 Use country-specific item One (US) metric.  ... 
doi:10.1016/j.jclinepi.2021.01.011 pmid:33524487 fatcat:o7yf635yhvaj3ebch6d7h5pud4

Exploring GTRS Based Recommender Systems with Users of Different Rating Patterns [chapter]

Bingyu Li, JingTao Yao
2018 Lecture Notes in Computer Science  
Bingyu Li, candidate for the degree of Master of Science in Computer Science, has presented a thesis titled, Exploring GTRS Based Recommender Systems With Users of Different Rating Patterns, in an oral  ...  examination held on Julyl 4, 2018.  ...  Likewise, for each user, the item set is also classified into the categories of recommended and not recommended, based on whether or not an item is recommended to him or her.  ... 
doi:10.1007/978-3-319-99368-3_31 fatcat:rzflqgzjibfl3ladlf462zez2u

Do Offline Metrics Predict Online Performance in Recommender Systems? [article]

Karl Krauth, Sarah Dean, Alex Zhao, Wenshuo Guo, Mihaela Curmei, Benjamin Recht, Michael I. Jordan
2020 arXiv   pre-print
Furthermore, we observe that the ranking of recommenders varies depending on the amount of initial offline data available.  ...  We study the impact of adding exploration strategies, and observe that their effectiveness, when compared to greedy recommendation, is highly dependent on the recommendation algorithm.  ...  We sample 100k initial ratings on which to tune the recommenders, at every timestep we sample 200 users to recommend items to, and we run the simulation until we observe 200k ratings.  ... 
arXiv:2011.07931v1 fatcat:fre2cuepjzcv5gtnk3ulnblywu

A Differentiable Ranking Metric Using Relaxed Sorting Operation for Top-K Recommender Systems [article]

Hyunsung Lee, Yeongjae Jang, Jaekwang Kim, Honguk Woo
2020 arXiv   pre-print
A recommender system generates personalized recommendations for a user by computing the preference score of items, sorting the items according to the score, and filtering top-K items with high scores.  ...  While sorting and ranking items are integral for this recommendation procedure, it is nontrivial to incorporate them in the process of end-to-end model training since sorting is nondifferentiable and hard  ...  Through experiments, we show that DRM considerably outperforms other recommenders on real-world datasets.  ... 
arXiv:2008.13141v4 fatcat:bgxf3itixjbypblycbng3nvnuq
« Previous Showing results 1 — 15 out of 113,284 results