Filters








412 Hits in 6.3 sec

Improving Long-Term Metrics in Recommendation Systems using Short-Horizon Reinforcement Learning [article]

Bogdan Mazoure, Paul Mineiro, Pavithra Srinath, Reza Sharifi Sedeh, Doina Precup, Adith Swaminathan
<span title="2021-09-14">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
We develop a new reinforcement learning algorithm called Short Horizon Policy Improvement (SHPI) that approximates policy-induced drift in user behavior across sessions.  ...  Optimizing a long-term metric is challenging because the learning signal (whether the recommendations achieved their desired goals) is delayed and confounded by other user interactions with the system.  ...  Offline Short-Horizon Policy Iteration We now turn to the offline problem setting where we must recover a policy π that improves over µ using only the Algorithm 2: Offline SHPI Input : Batch D, horizon  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.00589v2">arXiv:2106.00589v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/2bf6vgp6xzfvre7dnhgnkwabpm">fatcat:2bf6vgp6xzfvre7dnhgnkwabpm</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210918124720/https://arxiv.org/pdf/2106.00589v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/c1/c8/c1c81448d16479679599f855691160e0d86389f0.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2106.00589v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Personalization for Web-based Services using Offline Reinforcement Learning [article]

Pavlos Athanasios Apostolopoulos, Zehui Wang, Hanson Wang, Chad Zhou, Kittipat Virochsiri, Norm Zhou, Igor L. Markov
<span title="2021-02-10">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Deployed in a production system for user authentication in a major social network, it significantly improves long-term objectives.  ...  We address challenges of learning such policies through model-free offline Reinforcement Learning (RL) with off-policy training.  ...  In this work, we use Offline RL to improve personalized authentication for a Web-based service.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2102.05612v1">arXiv:2102.05612v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/sj6ba75lrrecpc7h7xn6e46e34">fatcat:sj6ba75lrrecpc7h7xn6e46e34</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210212183841/https://arxiv.org/pdf/2102.05612v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/fe/0b/fe0b2ebe2053dfaf61f78c4b7f9d715a0b3a746d.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2102.05612v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Reward Reports for Reinforcement Learning [article]

Thomas Krendl Gilbert, Sarah Dean, Nathan Lambert, Tom Zick, Aaron Snoswell
<span title="2022-04-25">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper we sketch a framework for documenting deployed learning systems, which we call Reward Reports.  ...  The desire to build good systems in the face of complex societal effects requires a dynamic approach towards equity and access.  ...  ACKNOWLEDGMENTS The authors would like to acknowledge the Center for Human Compatible AI and the Center for Long Term Cybersecurity for their support. Manuscript submitted to ACM  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2204.10817v2">arXiv:2204.10817v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/a3ozbwynfbaofagn7ggyqmztei">fatcat:a3ozbwynfbaofagn7ggyqmztei</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220430090910/https://arxiv.org/pdf/2204.10817v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/f4/3e/f43eaf26845e28fe882c57d1912cedfcb8e2f3fc.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2204.10817v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Horizon: Facebook's Open Source Applied Reinforcement Learning Platform [article]

Jason Gauci, Edoardo Conti, Yitao Liang, Kittipat Virochsiri, Yuchen He, Zachary Kaden, Vivek Narayanan, Xiaohui Ye, Zhengxing Chen, Scott Fujimoto
<span title="2019-09-04">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper we present Horizon, Facebook's open source applied reinforcement learning (RL) platform.  ...  Unlike other RL platforms, which are often designed for fast prototyping and experimentation, Horizon is designed with production use cases as top of mind.  ...  However, in production systems data is often logged as it comes in, requiring offline logic to join the data in a format suitable for RL.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1811.00260v5">arXiv:1811.00260v5</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/dq5kkotuqjfvxilhda2q7ebpgi">fatcat:dq5kkotuqjfvxilhda2q7ebpgi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200904140616/https://arxiv.org/pdf/1811.00260v5.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/aa/78/aa7877e4bca493f5b480516316908c5feda83e1a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1811.00260v5" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Should I send this notification? Optimizing push notifications decision making by modeling the future [article]

Conor O'Brien, Huasen Wu, Shaodan Zhai, Dalin Guo, Wenzhe Shi, Jonathan J Hunt
<span title="2022-02-17">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
To counter these drawbacks, there is significant interest in recommender systems that optimize directly for long-term value (LTV).  ...  In this work we focus on mobile push notifications, where the long term effects of recommender system decisions can be particularly strong.  ...  Lastly, we discuss the importance of long term value (LTV) to recommender systems and future improvements planned for the system.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2202.08812v1">arXiv:2202.08812v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/6pa2aete45epfbp2mov4wx4s4u">fatcat:6pa2aete45epfbp2mov4wx4s4u</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220219051715/https://arxiv.org/pdf/2202.08812v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/95/ab/95aba584548d75708d5e2c4166a665d73efe182c.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2202.08812v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Lessons from Contextual Bandit Learning in a Customer Support Bot [article]

Nikos Karampatziakis, Sebastian Kochman, Jade Huang, Paul Mineiro, Kathy Osborne, Weizhu Chen
<span title="2019-06-18">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this work, we describe practical lessons we have learned from successfully using contextual bandits (CBs) to improve key business metrics of the Microsoft Virtual Agent for customer support.  ...  While our current use cases focus on single step einforcement learning (RL) and mostly in the domain of natural language processing and information retrieval we believe many of our findings are generally  ...  For example, in recommendation systems, clicks and dwell times have long been used as implicit ratings.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1905.02219v2">arXiv:1905.02219v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/xxwkrcbn6jaolkcevjvor7xp7e">fatcat:xxwkrcbn6jaolkcevjvor7xp7e</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200823134252/https://arxiv.org/pdf/1905.02219v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/73/43/7343514ec27c391efa3ae07e6d3df8d8929f19ef.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1905.02219v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Reinforcement Learning with a Disentangled Universal Value Function for Item Recommendation [article]

Kai Wang, Zhene Zou, Qilin Deng, Runze Wu, Jianrong Tao, Changjie Fan, Liang Chen, Peng Cui
<span title="2021-04-11">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In recent years, there are great interests as well as challenges in applying reinforcement learning (RL) to recommendation systems (RS).  ...  In this paper, we summarize three key practical challenges of large-scale RL-based recommender systems: massive state and action spaces, high-variance environment, and the unspecific reward setting in  ...  All rights reserved. user-agent interactions; (ii) the best strategy is to maximize users' overall long-term satisfaction without sacrificing the recommendations' short-term utility.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2104.02981v2">arXiv:2104.02981v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ktmwxxix25clhkke5gzxbwi5pu">fatcat:ktmwxxix25clhkke5gzxbwi5pu</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20210409042849/https://arxiv.org/pdf/2104.02981v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/ae/b6/aeb62e77f7d54d662e956f200a04c4f539d9d6f8.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2104.02981v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Optimized Recommender Systems with Deep Reinforcement Learning [article]

Lucas Farris
<span title="2021-10-06">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Recommender Systems have been the cornerstone of online retailers.  ...  recommendations.  ...  Acknowledgements The proposal for this work is to leverage interaction data from large retailers, use them to generate a RL environment, and measure how different Deep Reinforcement Learning (DRL) algorithms  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2110.03039v1">arXiv:2110.03039v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/oqjh4aezxjdb7jkvrn6qw4254a">fatcat:oqjh4aezxjdb7jkvrn6qw4254a</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20211009232225/https://arxiv.org/pdf/2110.03039v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/93/36/9336feefc2e506c413eaca64b19819c0e49348b9.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2110.03039v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Explicit User Manipulation in Reinforcement Learning Based Recommender Systems [article]

Matthew Sparr
<span title="2022-03-20">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Recommender systems on such platforms, therefore, have great potential to influence users in undesirable ways. However, it may also be possible for this form of manipulation to be used intentionally.  ...  as a significant concern in reinforcement learning based recommender systems.  ...  Several brainstorming sessions lead me to a topic in which I became engrossed. I would also like to thank those in my professional and personal life for their continued support throughout the process.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2203.10629v1">arXiv:2203.10629v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/ottjecaqkfgvrjzzzohvdxluqe">fatcat:ottjecaqkfgvrjzzzohvdxluqe</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220323064421/https://arxiv.org/ftp/arxiv/papers/2203/2203.10629.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/be/c6/bec65b17fd26e61972a160962f296121c586d43c.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2203.10629v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Reinforcement Learning in Practice: Opportunities and Challenges [article]

Yuxi Li
<span title="2022-04-22">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Then we discuss opportunities of RL, in particular, products and services, games, bandits, recommender systems, robotics, transportation, finance and economics, healthcare, education, combinatorial optimization  ...  We conclude with a discussion, attempting to answer: "Why has RL not been widely adopted in practice yet?" and "When is RL helpful?".  ...  A technical debt refers to the long-term hidden costs accumulated from expedient yet suboptimal decisions in the short term.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2202.11296v2">arXiv:2202.11296v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/xdtsmme22rfpfn6rgfotcspnhy">fatcat:xdtsmme22rfpfn6rgfotcspnhy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220426140729/https://arxiv.org/pdf/2202.11296v2.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/6d/0a/6d0adac188152fbaa45a88ba4da788926ed8144a.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2202.11296v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems [article]

Sergey Levine, Aviral Kumar, George Tucker, Justin Fu
<span title="2020-11-01">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this tutorial article, we aim to provide the reader with the conceptual tools needed to get started on research on offline reinforcement learning algorithms: reinforcement learning algorithms that utilize  ...  explored in recent work to mitigate these challenges, along with recent applications, and a discussion of perspectives on open problems in the field.  ...  Offline RL has also been used for optimizing long term treatment plans.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.01643v3">arXiv:2005.01643v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/kyw5xc4dijgz3dpuytnbcrmlam">fatcat:kyw5xc4dijgz3dpuytnbcrmlam</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20201106132758/https://arxiv.org/pdf/2005.01643v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/36/96/369679dafc188d7dbc0580a35586a8dcbe5d2016.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.01643v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Towards Autonomous Satellite Communications: An AI-based Framework to Address System-level Challenges [article]

Juan Jose Garau-Luis and Skylar Eiskowitz and Nils Pachler and Edward Crawley and Bruce Cameron
<span title="2021-12-11">2021</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
In this paper we try to bridge this gap by characterizing the system-level needs that must be met to increase satellite autonomy, and introduce three AI-based components (Demand Estimator, Offline Planner  ...  In response to these gaps, we outline the three necessary components and highlight their interactions.  ...  Offline Planner (OP), responsible for long-term proactive decisions; and the Real Time Engine (RTE), responsible for short-term reactive decisions.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2112.06055v1">arXiv:2112.06055v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/wi64qxvo2rdlhevnpbefq6kmoy">fatcat:wi64qxvo2rdlhevnpbefq6kmoy</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20211215202617/https://arxiv.org/pdf/2112.06055v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/22/ff/22ffdcde25b83b8a4e0fec7b30e8add01df2536f.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2112.06055v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Deep Reinforcement Learning based Recommendation with Explicit User-Item Interactions Modeling [article]

Feng Liu, Ruiming Tang, Xutao Li, Weinan Zhang, Yunming Ye, Haokun Chen, Huifeng Guo, Yuzhou Zhang
<span title="2019-10-29">2019</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
systems, (2) focusing on the immediate feedback of recommended items and neglecting the long-term rewards.  ...  systems, which can consider both the dynamic adaptation and long-term rewards.  ...  systems; (2) focusing on the immediate feedback of recommended items and neglecting the long-term rewards.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1810.12027v3">arXiv:1810.12027v3</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/mlv5gkrjfnc65kpsyjjn7jj7ou">fatcat:mlv5gkrjfnc65kpsyjjn7jj7ou</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200828011048/https://arxiv.org/pdf/1810.12027v3.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/c5/01/c5013332ec4e0fca60f3028b717bc6030e237114.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1810.12027v3" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Sequential Search with Off-Policy Reinforcement Learning [article]

Dadong Miao, Yanan Wang, Guoyu Tang, Lin Liu, Sulong Xu, Bo Long, Yun Xiao, Lingfei Wu, Yunjiang Jiang
<span title="2022-02-01">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Extensive ablation experiments demonstrate significant improvement each component brings to its state-of-the-art baseline, on a variety of offline and online metrics.  ...  selected item-only features from long-term interactions.  ...  The overall metric improvements are reported in Table 4 .  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2202.00245v1">arXiv:2202.00245v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/eiqxg2wf3bgava4pgbdh6jq56y">fatcat:eiqxg2wf3bgava4pgbdh6jq56y</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20220203201738/https://arxiv.org/pdf/2202.00245v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/3b/f8/3bf8cf64be33fd53b15077842154c7cb23419223.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2202.00245v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>

Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning [article]

Dhruv Shah, Peng Xu, Yao Lu, Ted Xiao, Alexander Toshev, Sergey Levine, Brian Ichter
<span title="2022-03-29">2022</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
Empirical evaluations for maze-solving and robotic manipulation tasks demonstrate that our approach improves long-horizon performance and enables better zero-shot generalization than alternative model-free  ...  However for long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and chaining lower-level skills.  ...  objectives in long-horizon tasks.  ... 
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2111.03189v2">arXiv:2111.03189v2</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/7piooauszzam3hyknzcyionuca">fatcat:7piooauszzam3hyknzcyionuca</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20211201064400/https://arxiv.org/pdf/2111.03189v1.pdf" title="fulltext PDF download [not primary version]" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <span style="color: #f43e3e;">&#10033;</span> <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/2c/d4/2cd4eb390c0c5f1c5a5e35b2663b13e15fced2c8.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2111.03189v2" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>
&laquo; Previous Showing results 1 &mdash; 15 out of 412 results