A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Training language models to follow instructions with human feedback
[article]
2022
arXiv
pre-print
In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. ...
Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent. ...
Thanks to those who contributed in various ways to the infrastructure used to train and deploy our models, including: Daniel Ziegler, William Saunders, Brooke Chan, Dave Cummings, Chris Hesse, Shantanu ...
arXiv:2203.02155v1
fatcat:nsjth3nazzeithrsgpggfbchci
Inner Monologue: Embodied Reasoning through Planning with Language Models
[article]
2022
arXiv
pre-print
In this work, we investigate to what extent LLMs used in such embodied contexts can reason over sources of feedback provided through natural language, without any additional training. ...
Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. ...
Acknowledgments The authors would like to thank Kanishka Rao and Vincent Vanhoucke for valuable feedback and discussions. ...
arXiv:2207.05608v1
fatcat:texkdnb2cfbk3hsq3k37afvvua
A Review on Interactive Reinforcement Learning from Human Social Feedback
2020
IEEE Access
This paper reviews methods for interactive reinforcement learning agent to learn from human social feedback and the ways of delivering feedback. ...
Reinforcement learning agent learns how to perform a task by interacting with the environment. ...
The agent can also learn from instruction by mapping free-form natural language instructions to intermediate shaping rewards [18] or learn to follow language instructions by learning a reward function ...
doi:10.1109/access.2020.3006254
fatcat:omtnm6g6lvfenduatelslnc5ce
Training Language Models with Language Feedback
[article]
2022
arXiv
pre-print
First, we condition the language model on the initial output and feedback to generate many refinements. Second, we choose the refinement with the highest similarity to the feedback. ...
Here, we propose to learn from natural language feedback, which conveys more information per human evaluation. We learn from language feedback on model outputs using a three-step learning algorithm. ...
A model trained with our algorithm generates summaries that human evaluators prefer to human reference summaries ∼51% of the time. ...
arXiv:2204.14146v3
fatcat:27w63k7wofchdna7cazqbepxey
Incorporating Voice Instructions in Model-Based Reinforcement Learning for Self-Driving Cars
[article]
2022
arXiv
pre-print
We propose incorporating natural language voice instructions (NLI) in model-based deep reinforcement learning to train self-driving cars. ...
This paper presents a novel approach that supports natural language voice instructions to guide deep reinforcement learning (DRL) algorithms when training self-driving cars. ...
Model-Based RL with Voice Instructions This paper proposes incorporating natural language voice instructions in model-based reinforcement learning to train self-driving car agents. ...
arXiv:2206.10249v1
fatcat:4anwaqq2irfbdlm7q7sxepx2vq
Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior
2021
Transactions of the Association for Computational Linguistics
We study continual learning for natural language instruction generation, by observing human users' instruction execution. ...
In interaction with real users, our system demonstrates dramatic improvements in its ability to generate language over time. ...
compute syntax complexity; Ge Gao, Koji Shiono, and Takayuki Kojima for feedback on our interaction platform; and the crowdsourcing workers for participating in our data collection. ...
doi:10.1162/tacl_a_00428
fatcat:b7h6q4qpf5btdggoqu3x6xmlhq
Assistive Recipe Editing through Critiquing
[article]
2022
arXiv
pre-print
Prior studies have used pre-trained language models, or relied on small paired recipe data (e.g., a recipe paired with a similar one that satisfies a dietary constraint). ...
The model is trained for recipe completion to learn semantic relationships within recipes. ...
pre-trained recipe generators and language models. ...
arXiv:2205.02454v1
fatcat:2hjmw4t4lfbrjktr2p7n6obi2y
Intention Understanding in Human–Robot Interaction Based on Visual-NLP Semantics
2021
Frontiers in Neurorobotics
The results show that our algorithm can interact with different types of instructions, even with unseen sentence structures. ...
The proposed framework includes a language semantics module to extract the keywords despite the explicitly of the command instruction, a visual object recognition module to identify the objects in front ...
To evaluate the efficacy of our approach, we classify human instructions into the following three types: Clear Natural Language Instructions, saying object names or synonyms clearly; Vague Natural Language ...
doi:10.3389/fnbot.2020.610139
pmid:33613223
pmcid:PMC7888278
fatcat:q47eqerkzngmlmicssut3f7xna
Few-shot Subgoal Planning with Language Models
[article]
2022
arXiv
pre-print
Given a text instruction, we show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences. ...
We further propose a simple strategy to re-rank language model predictions based on interaction and feedback from the environment. ...
We use this feedback as a learning signal to train a ranking model that re-ranks language model predictions. ...
arXiv:2205.14288v1
fatcat:dn46md4hg5hqnogal5cmsrahru
Correcting Robot Plans with Natural Language Feedback
[article]
2022
arXiv
pre-print
These corrections can be leveraged to get 81% and 93% success rates on tasks where the original planner failed, with either one or two language corrections. ...
In this paper, we explore natural language as an expressive and flexible tool for robot correction. We describe how to map from natural language sentences to transformations of cost functions. ...
The output of this model is used to map to the cost associated with the language instruction. ...
arXiv:2204.05186v1
fatcat:eshvwh5bvvfm5oz47uxakvweyq
Learning from natural instructions
2013
Machine Learning
We are interested in providing a way for a human teacher to interact with an automated learner using natural instructions, thus allowing the teacher to communicate the relevant domain expertise to the ...
We show that our learning approach can eventually use natural language instructions to learn the target concept and play the game legally. ...
In Vogel and Jurafsky (2010) the authors use a reinforcement learning framework to train an agent to follow navigational instructions. ...
doi:10.1007/s10994-013-5407-y
fatcat:hwtlnkvfd5gklijltazvrzeuuy
Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions
[article]
2018
arXiv
pre-print
Comprehension of spoken natural language is an essential component for robots to communicate with human effectively. ...
Specifically, we integrate deep-learning-based object detection together with natural language processing technologies to handle unconstrained spoken instructions, and propose a method for robots to resolve ...
ACKNOWLEDGMENT We would like to thank Masaaki Fukuda, Totaro Nakashima, Masanobu Tsukada, and Eiichi Matsumoto for their assistance with our extensive data collection. ...
arXiv:1710.06280v2
fatcat:dj7j2xaisngkpbqxcg5it4kwye
FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
[article]
2022
arXiv
pre-print
The speaker-follower models have proven to be effective in vision-and-language navigation, where a speaker model is used to synthesize new instructions to augment the training data for a follower navigation ...
In this paper, we present foam, a Follower-aware speaker Model that is constantly updated given the follower feedback, so that the generated instructions can be more suitable to the current learning state ...
Acknowledgement We would like to thank the anonymous reviewers for valuable suggestions and Te-Lin Wu for helpful discussions. ...
arXiv:2206.04294v1
fatcat:q5syywbiljfwdhyzdnljp7wvt4
Guiding Reinforcement Learning Exploration Using Natural Language
[article]
2017
arXiv
pre-print
In this work we present a technique to use natural language to help reinforcement learning generalize to unseen environments. ...
We then use this learned model to guide agent exploration using a modified version of policy shaping to make it more effective at learning in unseen environments. ...
Acquiring Human Feedback Typically, training an agent using critique requires a large amount of consistent online feedback in order to build up a critique policy, a model of human feedback for a specific ...
arXiv:1707.08616v2
fatcat:4vaiamkzcrb65il225b42hvsbu
Multimodal estimation and communication of latent semantic knowledge for robust execution of robot instructions
2020
The international journal of robotics research
The goal of this article is to enable robots to perform robust task execution following human instructions in partially observable environments. ...
We introduce a probabilistic model that fuses linguistic knowledge with visual and haptic observations into a cumulative belief over latent world attributes to infer the meaning of instructions and execute ...
We thank Michael Noseworthy for valuable feedback on this manuscript. ...
doi:10.1177/0278364920917755
fatcat:u2w3o7h4svea5gdryo6bfe5xae
« Previous
Showing results 1 — 15 out of 120,916 results