120,916 Hits in 6.7 sec

Training language models to follow instructions with human feedback [article]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton (+8 others)
2022 arXiv   pre-print
In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback.  ...  Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.  ...  Thanks to those who contributed in various ways to the infrastructure used to train and deploy our models, including: Daniel Ziegler, William Saunders, Brooke Chan, Dave Cummings, Chris Hesse, Shantanu  ... 
arXiv:2203.02155v1 fatcat:nsjth3nazzeithrsgpggfbchci

Inner Monologue: Embodied Reasoning through Planning with Language Models [article]

Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown (+5 others)
2022 arXiv   pre-print
In this work, we investigate to what extent LLMs used in such embodied contexts can reason over sources of feedback provided through natural language, without any additional training.  ...  Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots.  ...  Acknowledgments The authors would like to thank Kanishka Rao and Vincent Vanhoucke for valuable feedback and discussions.  ... 
arXiv:2207.05608v1 fatcat:texkdnb2cfbk3hsq3k37afvvua

A Review on Interactive Reinforcement Learning from Human Social Feedback

Jinying Lin, Zhen Ma, Randy Gomez, Keisuke Nakamura, Bo He, Guangliang Li
2020 IEEE Access  
This paper reviews methods for interactive reinforcement learning agent to learn from human social feedback and the ways of delivering feedback.  ...  Reinforcement learning agent learns how to perform a task by interacting with the environment.  ...  The agent can also learn from instruction by mapping free-form natural language instructions to intermediate shaping rewards [18] or learn to follow language instructions by learning a reward function  ... 
doi:10.1109/access.2020.3006254 fatcat:omtnm6g6lvfenduatelslnc5ce

Training Language Models with Language Feedback [article]

Jérémy Scheurer, Jon Ander Campos, Jun Shern Chan, Angelica Chen, Kyunghyun Cho, Ethan Perez
2022 arXiv   pre-print
First, we condition the language model on the initial output and feedback to generate many refinements. Second, we choose the refinement with the highest similarity to the feedback.  ...  Here, we propose to learn from natural language feedback, which conveys more information per human evaluation. We learn from language feedback on model outputs using a three-step learning algorithm.  ...  A model trained with our algorithm generates summaries that human evaluators prefer to human reference summaries ∼51% of the time.  ... 
arXiv:2204.14146v3 fatcat:27w63k7wofchdna7cazqbepxey

Incorporating Voice Instructions in Model-Based Reinforcement Learning for Self-Driving Cars [article]

Mingze Wang, Ziyang Zhang, Grace Hui Yang
2022 arXiv   pre-print
We propose incorporating natural language voice instructions (NLI) in model-based deep reinforcement learning to train self-driving cars.  ...  This paper presents a novel approach that supports natural language voice instructions to guide deep reinforcement learning (DRL) algorithms when training self-driving cars.  ...  Model-Based RL with Voice Instructions This paper proposes incorporating natural language voice instructions in model-based reinforcement learning to train self-driving car agents.  ... 
arXiv:2206.10249v1 fatcat:4anwaqq2irfbdlm7q7sxepx2vq

Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior

Noriyuki Kojima, Alane Suhr, Yoav Artzi
2021 Transactions of the Association for Computational Linguistics  
We study continual learning for natural language instruction generation, by observing human users' instruction execution.  ...  In interaction with real users, our system demonstrates dramatic improvements in its ability to generate language over time.  ...  compute syntax complexity; Ge Gao, Koji Shiono, and Takayuki Kojima for feedback on our interaction platform; and the crowdsourcing workers for participating in our data collection.  ... 
doi:10.1162/tacl_a_00428 fatcat:b7h6q4qpf5btdggoqu3x6xmlhq

Assistive Recipe Editing through Critiquing [article]

Diego Antognini, Shuyang Li, Boi Faltings, Julian McAuley
2022 arXiv   pre-print
Prior studies have used pre-trained language models, or relied on small paired recipe data (e.g., a recipe paired with a similar one that satisfies a dietary constraint).  ...  The model is trained for recipe completion to learn semantic relationships within recipes.  ...  pre-trained recipe generators and language models.  ... 
arXiv:2205.02454v1 fatcat:2hjmw4t4lfbrjktr2p7n6obi2y

Intention Understanding in Human–Robot Interaction Based on Visual-NLP Semantics

Zhihao Li, Yishan Mu, Zhenglong Sun, Sifan Song, Jionglong Su, Jiaming Zhang
2021 Frontiers in Neurorobotics  
The results show that our algorithm can interact with different types of instructions, even with unseen sentence structures.  ...  The proposed framework includes a language semantics module to extract the keywords despite the explicitly of the command instruction, a visual object recognition module to identify the objects in front  ...  To evaluate the efficacy of our approach, we classify human instructions into the following three types: Clear Natural Language Instructions, saying object names or synonyms clearly; Vague Natural Language  ... 
doi:10.3389/fnbot.2020.610139 pmid:33613223 pmcid:PMC7888278 fatcat:q47eqerkzngmlmicssut3f7xna

Few-shot Subgoal Planning with Language Models [article]

Lajanugen Logeswaran, Yao Fu, Moontae Lee, Honglak Lee
2022 arXiv   pre-print
Given a text instruction, we show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.  ...  We further propose a simple strategy to re-rank language model predictions based on interaction and feedback from the environment.  ...  We use this feedback as a learning signal to train a ranking model that re-ranks language model predictions.  ... 
arXiv:2205.14288v1 fatcat:dn46md4hg5hqnogal5cmsrahru

Correcting Robot Plans with Natural Language Feedback [article]

Pratyusha Sharma, Balakumar Sundaralingam, Valts Blukis, Chris Paxton, Tucker Hermans, Antonio Torralba, Jacob Andreas, Dieter Fox
2022 arXiv   pre-print
These corrections can be leveraged to get 81% and 93% success rates on tasks where the original planner failed, with either one or two language corrections.  ...  In this paper, we explore natural language as an expressive and flexible tool for robot correction. We describe how to map from natural language sentences to transformations of cost functions.  ...  The output of this model is used to map to the cost associated with the language instruction.  ... 
arXiv:2204.05186v1 fatcat:eshvwh5bvvfm5oz47uxakvweyq

Learning from natural instructions

Dan Goldwasser, Dan Roth
2013 Machine Learning  
We are interested in providing a way for a human teacher to interact with an automated learner using natural instructions, thus allowing the teacher to communicate the relevant domain expertise to the  ...  We show that our learning approach can eventually use natural language instructions to learn the target concept and play the game legally.  ...  In Vogel and Jurafsky (2010) the authors use a reinforcement learning framework to train an agent to follow navigational instructions.  ... 
doi:10.1007/s10994-013-5407-y fatcat:hwtlnkvfd5gklijltazvrzeuuy

Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions [article]

Jun Hatori, Yuta Kikuchi, Sosuke Kobayashi, Kuniyuki Takahashi, Yuta Tsuboi, Yuya Unno, Wilson Ko, Jethro Tan
2018 arXiv   pre-print
Comprehension of spoken natural language is an essential component for robots to communicate with human effectively.  ...  Specifically, we integrate deep-learning-based object detection together with natural language processing technologies to handle unconstrained spoken instructions, and propose a method for robots to resolve  ...  ACKNOWLEDGMENT We would like to thank Masaaki Fukuda, Totaro Nakashima, Masanobu Tsukada, and Eiichi Matsumoto for their assistance with our extensive data collection.  ... 
arXiv:1710.06280v2 fatcat:dj7j2xaisngkpbqxcg5it4kwye

FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation [article]

Zi-Yi Dou, Nanyun Peng
2022 arXiv   pre-print
The speaker-follower models have proven to be effective in vision-and-language navigation, where a speaker model is used to synthesize new instructions to augment the training data for a follower navigation  ...  In this paper, we present foam, a Follower-aware speaker Model that is constantly updated given the follower feedback, so that the generated instructions can be more suitable to the current learning state  ...  Acknowledgement We would like to thank the anonymous reviewers for valuable suggestions and Te-Lin Wu for helpful discussions.  ... 
arXiv:2206.04294v1 fatcat:q5syywbiljfwdhyzdnljp7wvt4

Guiding Reinforcement Learning Exploration Using Natural Language [article]

Brent Harrison, Upol Ehsan, Mark O. Riedl
2017 arXiv   pre-print
In this work we present a technique to use natural language to help reinforcement learning generalize to unseen environments.  ...  We then use this learned model to guide agent exploration using a modified version of policy shaping to make it more effective at learning in unseen environments.  ...  Acquiring Human Feedback Typically, training an agent using critique requires a large amount of consistent online feedback in order to build up a critique policy, a model of human feedback for a specific  ... 
arXiv:1707.08616v2 fatcat:4vaiamkzcrb65il225b42hvsbu

Multimodal estimation and communication of latent semantic knowledge for robust execution of robot instructions

Jacob Arkin, Daehyung Park, Subhro Roy, Matthew R Walter, Nicholas Roy, Thomas M Howard, Rohan Paul
2020 The international journal of robotics research  
The goal of this article is to enable robots to perform robust task execution following human instructions in partially observable environments.  ...  We introduce a probabilistic model that fuses linguistic knowledge with visual and haptic observations into a cumulative belief over latent world attributes to infer the meaning of instructions and execute  ...  We thank Michael Noseworthy for valuable feedback on this manuscript.  ... 
doi:10.1177/0278364920917755 fatcat:u2w3o7h4svea5gdryo6bfe5xae
« Previous Showing results 1 — 15 out of 120,916 results