Filters








11,810 Hits in 3.9 sec

Variance Reduction in Gradient Exploration for Online Learning to Rank

Huazheng Wang, Sonwoo Kim, Eric McCord-Snook, Qingyun Wu, Hongning Wang
2019 Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR'19  
Online Learning to Rank (OL2R) algorithms learn from implicit user feedback on the fly.  ...  variance reduction.  ...  ACKNOWLEDGMENTS We thank the anonymous reviewers for their insightful comments.  ... 
doi:10.1145/3331184.3331264 dblp:conf/sigir/WangKMWW19 fatcat:u5647tll75hvfpfdwqshswhfxi

Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo sampling [article]

Alexander Buchholz, Jan Malte Lichtenberg, Giuseppe Di Benedetto, Yannik Stein, Vito Bellini, Matteo Ruffini
2022 arXiv   pre-print
for variance reduction.  ...  The Plackett-Luce (PL) model is ubiquitous in learning-to-rank (LTR) because it provides a useful and intuitive probabilistic model for sampling ranked lists.  ...  QMC FOR PL SAMPLING We suggest to leverage the variance reduction from QMC for two purposes: (i) for obtaining low variance propensity estimates; (ii) for obtaining more precise gradients for our ranking  ... 
arXiv:2205.06024v1 fatcat:gqqhfqirabepjkltypcmkkbzdu

Ranking Policy Gradient [article]

Kaixiang Lin, Jiayu Zhou
2019 arXiv   pre-print
Towards the sample-efficient RL, we propose ranking policy gradient (RPG), a policy gradient method that learns the optimal rank of a set of discrete actions.  ...  These results lead to a general off-policy learning framework, which preserves the optimality, reduces variance, and improves the sample-efficiency.  ...  A reduction of imitation learning and structured prediction to no-regret online learning.  ... 
arXiv:1906.09674v3 fatcat:h33pqii4nnec3lavzrbfjp6iry

Low-Rank Training of Deep Neural Networks for Emerging Memory Technology [article]

Albert Gural, Phillip Nadeau, Mehul Tikekar, Boris Murmann
2021 arXiv   pre-print
Yet, the ability to train at the edge is becoming increasingly important as it enables real-time adaptability to device drift and environmental variation, user customization, and federated learning across  ...  In this work, we address two key challenges for training on edge devices with non-volatile memory: low write density and low auxiliary memory.  ...  In dense NVM applications, higher bitwidths may be achievable, allowing for corresponding reductions in the LRT rank and therefore, reductions in the auxiliary memory requirements. 1 2 3 4 5  ... 
arXiv:2009.03887v2 fatcat:tebvih3q2vbkzhcg6gtyo5qrfa

Online Learning to Sample [article]

Guillaume Bouchard, Théo Trouillon, Julien Perez, Adrien Gaidon
2016 arXiv   pre-print
Stochastic Gradient Descent (SGD) is one of the most widely used techniques for online optimization in machine learning.  ...  Second, we show that the sampling distribution of an SGD algorithm can be estimated online by incrementally minimizing the variance of the gradient.  ...  The objective is to maximize the expected reward for the target policy P w, and to minimize the variance of the gradient for the policy gradient for the exploration policy Q τ .  ... 
arXiv:1506.09016v2 fatcat:tu5fu5bfkbhgdp374fkepkpfzu

Counterfactual Online Learning to Rank [chapter]

Shengyao Zhuang, Guido Zuccon
2020 Lecture Notes in Computer Science  
In this paper, we propose a counterfactual online learning to rank algorithm (COLTR) that combines the key components of both CLTR and OLTR.  ...  Two main methods have arisen for optimizing rankers based on implicit feedback: counterfactual learning to rank (CLTR), which learns a ranker from the historical click-through data collected from a deployed  ...  Second, the use of the exploration variance reduction method [35, 36] could be investigated to reduce the gradient exploration space: this may solve the uniform sampling distribution problem. performance  ... 
doi:10.1007/978-3-030-45439-5_28 fatcat:gpqp6bfqgza6tmd67sfdbqtyky

On the Variance of Unbiased Online Recurrent Optimization [article]

Tim Cooijmans, James Martens
2019 arXiv   pre-print
The recently proposed Unbiased Online Recurrent Optimization algorithm (UORO, arXiv:1702.05043) uses an unbiased approximation of RTRL to achieve fully online gradient-based learning in RNNs.  ...  In this work we analyze the variance of the gradient estimate computed by UORO, and propose several possible changes to the method which reduce this variance both in theory and practice.  ...  Acknowledgments The authors thank Max Jaderberg, David Sussillo, David Duvenaud and Aaron Courville for helpful discussion, and Chris Maddison and Grzegorz Swirszcz for reviewing drafts of this paper.  ... 
arXiv:1902.02405v1 fatcat:lolcvzadgnbffnv5ehm5kbtr54

Structured Prediction via Learning to Search under Bandit Feedback

Amr Sharaf, Hal Daumé III
2017 Proceedings of the 2nd Workshop on Structured Prediction for Natural Language Processing  
We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy.  ...  We consider two cases: a pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action.  ...  Acknowledgements We thank the anonymous reviewers for many helpful comments. This work was supported by NSF grant IIS-1618193 and LTS grant DO-0032.  ... 
doi:10.18653/v1/w17-4304 dblp:conf/emnlp/SharafD17 fatcat:ago4hgc725hannaw5w7u7hfpym

Unbiased Learning to Rank: Online or Offline? [article]

Qingyao Ai, Tao Yang, Huazheng Wang, Jiaxin Mao
2020 arXiv   pre-print
In this paper, we formalize the task of unbiased learning to rank and show that existing algorithms for offline unbiased learning and online learning to rank are just the two sides of the same coin.  ...  How to obtain an unbiased ranking model by learning to rank with biased user feedback is an important research question for IR.  ...  There is extensive research on extending DBGD with different result exploration strategies [50, 55, 62, 62] and variance reduction techniques [54] . For example, Schuth et al.  ... 
arXiv:2004.13574v3 fatcat:53qz55i47bdjdopxyqjvsiym6u

Structured Evolution with Compact Architectures for Scalable Policy Optimization [article]

Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard E. Turner, Adrian Weller
2018 arXiv   pre-print
We show that this algorithm can be successfully applied to learn better quality compact policies than those using standard gradient estimation techniques.  ...  We do not need heuristics such as fitness shaping to learn good quality policies, resulting in a simple and theoretically motivated training mechanism.  ...  Whilst our methods are focused on variance reduction for isotropic Gaussian smoothings of an objective function F , there has been much work on adapting the smoothing online, to reflect the local properties  ... 
arXiv:1804.02395v2 fatcat:3ydvqrox55bklkatj5wbzclsqm

Learning Neural Ranking Models Online from Implicit User Feedback

Yiling Jia, Hongning Wang
2022 Proceedings of the ACM Web Conference 2022  
CCS CONCEPTS • Information systems → Learning to rank; • Theory of computation → Regret bounds; Online learning theory.  ...  Existing online learning to rank (OL2R) solutions are limited to linear models, which are incompetent to capture possible non-linear relations between queries and documents.  ...  To ensure an unbiased gradient estimate, DBGD uniformly explores in the model space, which costs high variance and high regret.  ... 
doi:10.1145/3485447.3512250 fatcat:ojp2xhpv7nhmdodg6zbfkdgyx4

Adversarial Gradient Driven Exploration for Deep Click-Through Rate Prediction [article]

Kailun Wu, Zhangming Chan, Weijie Bian, Lejian Ren, Shiming Xiang, Shuguang Han, Hongbo Deng, Bo Zheng
2022 arXiv   pre-print
Exploration-Exploitation (E&E) algorithms are commonly adopted to deal with the feedback-loop issue in large-scale online recommender systems.  ...  From the perspective of online learning, the adoption of an exploration strategy would also affect the collecting of training data, which further influences model learning.  ...  On top of Gradient-TS and Gradient-UCB, we further adopt the Underestimation Refinement methods for variance estimation [43] .  ... 
arXiv:2112.11136v2 fatcat:44obfmblwvetjes544jysfdxka

Optimal Kronecker-Sum Approximation of Real Time Recurrent Learning [article]

Frederik Benzing, Marcelo Matheus Gauy, Asier Mujika, Anders Martinsson, Angelika Steger
2019 arXiv   pre-print
In contrast, the online training algorithm Real Time Recurrent Learning (RTRL) provides untruncated gradients, with the disadvantage of impractically large computational costs.  ...  One of the central goals of Recurrent Neural Networks (RNNs) is to learn long-term dependencies in sequential data.  ...  Acknowledgements We would like to thank Florian Meier and Pascal Su for helpful discussions and valuable comments on the presentation of this work.  ... 
arXiv:1902.03993v2 fatcat:ob2mnprthjdchmp22657sc2vcy

Adversarial Gradient Driven Exploration for Deep Click-Through Rate Prediction

Kailun Wu, Weijie Bian, Zhangming Chan, Lejian Ren, Shiming Xiang, Shu-Guang Han, Hongbo Deng, Bo Zheng
2022 Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining  
Exploration-Exploitation (E&E) algorithms are commonly adopted to deal with the feedback-loop issue in large-scale online recommender systems.  ...  From the perspective of online learning, the adoption of an exploration strategy would also affect the collecting of training data, which further influences model learning.  ...  On top of Gradient-TS and Gradient-UCB, we further adopt the Underestimation Refinement methods for variance estimation [39] .  ... 
doi:10.1145/3534678.3539461 fatcat:faq2zinu3jgw3cft2nie3t5kji

Learning Structured Predictors from Bandit Feedback for Interactive NLP

Artem Sokolov, Julia Kreutzer, Christopher Lo, Stefan Riezler
2016 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
Structured prediction from bandit feedback describes a learning scenario where instead of having access to a gold standard structure, a learner only receives partial feedback in form of the loss value  ...  We present supervised-to-bandit simulation experiments for several NLP tasks (machine translation, sequence labeling, text classification), showing that bandit learning from relative preferences eases  ...  Acknowledgments This research was supported in part by the German research foundation (DFG), and in part by a research cooperation grant with the Amazon Development Center Germany.  ... 
doi:10.18653/v1/p16-1152 dblp:conf/acl/SokolovKLR16 fatcat:fv25oaz545gmbkzis6acdtgbs4
« Previous Showing results 1 — 15 out of 11,810 results