A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Hierarchical Deep Q-Network from Imperfect Demonstrations in Minecraft
[article]
2020
arXiv
pre-print
We present Hierarchical Deep Q-Network (HDQfD) that took first place in the MineRL competition. HDQfD works on imperfect demonstrations and utilizes the hierarchical structure of expert trajectories. We introduce the procedure of extracting an effective sequence of meta-actions and subgoals from demonstration data. We present a structured task-dependent replay buffer and adaptive prioritizing technique that allow the HDQfD agent to gradually erase poor-quality expert data from the buffer. In
arXiv:1912.08664v4
fatcat:c7t67u2vxzgmjcqxmtwds4ve3i