A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Collective Noise Contrastive Estimation for Policy Transfer Learning
2016
PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE
We address the problem of learning behaviour policies to optimise online metrics from heterogeneous usage data. While online metrics, e.g., click-through rate, can be optimised effectively using exploration data, such data is costly to collect in practice, as it temporarily degrades the user experience. Leveraging related data sources to improve online performance would be extremely valuable, but is not possible using current approaches. We formulate this task as a policy transfer learning
doi:10.1609/aaai.v30i1.10153
fatcat:c6lp6cjdhjbf7fb4aawyzhbndq