Filters








5,000 Hits in 2.0 sec

Learning Multimodal Rewards from Rankings [article]

Vivek Myers, Erdem Bıyık, Nima Anari, Dorsa Sadigh
2021 arXiv   pre-print
We formulate the multimodal reward learning as a mixture learning problem and develop a novel ranking-based learning approach, where the experts are only required to rank a given set of trajectories.  ...  and focus on learning a multimodal reward function.  ...  Active Learning of Multimodal Rewards from Rankings In this section, we first start with presenting our learning framework.  ... 
arXiv:2109.12750v2 fatcat:sjjuye327zbyphtth2hdx6rdoa

Imitation Learning for Fashion Style Based on Hierarchical Multimodal Representation [article]

Shizhu Liu, Shanglin Yang, Hui Zhou
2020 arXiv   pre-print
In this work, we propose an adversarial inverse reinforcement learning formulation to recover reward functions based on hierarchical multimodal representation (HM-AIRL) during the imitation process.  ...  However, for machine agent, learning to imitate fashion experts from demonstrations can be challenging, especially for complex styles in environments with high-dimensional, multimodal observations.  ...  At low level, a shared multimodal variational autoencoder is employed to learn the jointly representation from image and attribute information for each item.  ... 
arXiv:2004.06229v1 fatcat:7ddx2atnmbfvdifubptflwmqje

Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Machine Translation System Report [article]

Renjie Zheng, Yilin Yang, Mingbo Ma, Liang Huang
2018 arXiv   pre-print
This paper describes multimodal machine translation systems developed jointly by Oregon State University and Baidu Research for WMT 2018 Shared Task on multimodal translation.  ...  We also explore different sequence level training methods including scheduled sampling and reinforcement learning which lead to substantial improvements.  ...  We adopt length reward ) on En-Cs task to find the optimal sentence length. We use a batch size of 50, SGD optimization, dropout rate as 0.1 and learning rate as 1.0.  ... 
arXiv:1808.10592v1 fatcat:4ph5o5vzrzgdzmyixsjn24l6wq

Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Machine Translation System Report

Renjie Zheng, Yilin Yang, Mingbo Ma, Liang Huang
2018 Proceedings of the Third Conference on Machine Translation: Shared Task Papers  
This paper describes multimodal machine translation systems developed jointly by Oregon State University and Baidu Research for WMT 2018 Shared Task on multimodal translation.  ...  We also explore different sequence level training methods including scheduled sampling and reinforcement learning which lead to substantial improvements.  ...  We adopt length reward ) on En-Cs task to find the optimal sentence length. We use a batch size of 50, SGD optimization, dropout rate as 0.1 and learning rate as 1.0.  ... 
doi:10.18653/v1/w18-6443 dblp:conf/wmt/ZhengYMH18 fatcat:xc77ux4pgnbcnd3x7brhzq6mue

Query-guided Regression Network with Context Policy for Phrase Grounding [article]

Kan Chen, Rama Kovvuri, Ram Nevatia
2017 arXiv   pre-print
useful cues from context in the description.  ...  State-of-the-art methods address the problem by ranking a set of proposals based on the relevance to each query, which are limited by the performance of independent proposal generation systems and ignore  ...  [22] learn multimodal correlation aided by context objects in visual content.  ... 
arXiv:1708.01676v1 fatcat:uhkaby573rfnnothcbktlgi2by

Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog [article]

Jiaping Zhang, Tiancheng Zhao, Zhou Yu
2018 arXiv   pre-print
We propose a multimodal hierarchical reinforcement learning framework that dynamically integrates vision and language for task-oriented visual dialog.  ...  The framework jointly learns the multimodal dialog state representation and the hierarchical dialog policy to improve both dialog task success and efficiency.  ...  The visual dialog semantic embedding module learns a multimodal dialog state representation to support the visual Figure 1 : The information flow of the multimodal hierarchical reinforcement learning  ... 
arXiv:1805.03257v1 fatcat:dg5npzmbcrcizbwonzf75eodfy

Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog

Jiaping Zhang, Tiancheng Zhao, Zhou Yu
2018 Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue  
We propose a multimodal hierarchical reinforcement learning framework that dynamically integrates vision and language for task-oriented visual dialog.  ...  The framework jointly learns the multimodal dialog state representation and the hierarchical dialog policy to improve both dialog task success and efficiency.  ...  Different from prior work, our proposed architecture uses hierarchical dialog policy to combine two RL architectures within a control flow, i.e., DQN and DRRN, in order to jointly learn multimodal dialog  ... 
doi:10.18653/v1/w18-5015 dblp:conf/sigdial/ZhangZY18 fatcat:ia7zvnbtxbcrrllvwjloknu3jm

Video Storytelling [article]

Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli
2018 arXiv   pre-print
First, we propose a context-aware framework for multimodal embedding learning, where we design a Residual Bidirectional Recurrent Neural Network to leverage contextual information from past and future.  ...  The Narrator is formulated as a reinforcement learning agent which is trained by directly optimizing the textual metric of the generated story.  ...  Then we introduce a variance-reduced reward to learn effective clip selection policy.  ... 
arXiv:1807.09418v1 fatcat:7fgnnfm33ngspgqlnpijppq4hu

Toddler-Guidance Learning: Impacts of Critical Period on Multimodal AI Agents [article]

Junseok Park, Kwanyoung Park, Hyunseok Oh, Ganghun Lee, Minsu Lee, Youngki Lee, Byoung-Tak Zhang
2022 arXiv   pre-print
We evaluate the impact of critical periods on AI agents from two perspectives: how and when they are guided best in both uni- and multimodal learning.  ...  We study three discrete levels of mutual interaction: weak-mentor guidance (sparse reward), moderate mentor guidance (helper-reward), and mentor demonstration (behavioral cloning).  ...  This shows that a critical regime also exists for guidance from helper-rewards in the multimodal object-finding task.  ... 
arXiv:2201.04990v1 fatcat:lyvailttjjecle5eyso24rayoe

Learning and Evaluation of Dialogue Strategies for New Applications: Empirical Methods for Optimization from Small Data Sets

Verena Rieser, Oliver Lemon
2011 Computational Linguistics  
We use Reinforcement Learning (RL) to learn multimodal dialogue strategies by interaction with a simulated environment which is "bootstrapped" from small amounts of Wizard-of-Oz (WOZ) data.  ...  Our results show that simulation-based RL significantly outperforms the average (human wizard) strategy as learned from the data by using Supervised Learning.  ...  The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7, 2007(FP7, -2013 under grant agreement number 216594 ("Computational Learning  ... 
doi:10.1162/coli_a_00038 fatcat:bki2gcas2rfizosz7rb77omi4u

Bumblebees distinguish floral scent patterns, and can transfer these to corresponding visual patterns

David A. Lawson, Lars Chittka, Heather M. Whitney, Sean A. Rands
2018 Proceedings of the Royal Society of London. Biological Sciences  
We show that bumblebees can learn different spatial patterns of the same scent, and that they are better at learning to distinguish between flowers when the scent pattern corresponds to a matching visual  ...  Surprisingly, once bees have learnt the spatial arrangement of a scent pattern, they subsequently prefer to visit novel unscented flowers that have an identical arrangement of visual marks, suggesting that multimodal  ...  (e) Sketch of the multimodal stimulus learning tests ( presented infigure 4), where either the scent and visual patterns corresponded so that scented wells were marked (i and ii), or where the rewarded  ... 
doi:10.1098/rspb.2018.0661 pmid:29899070 pmcid:PMC6015847 fatcat:3oqeqttyczawnknl7nfi7usboa

Sentiment Adaptive End-to-End Dialog Systems

Weiyan Shi, Zhou Yu
2018 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
Therefore, we propose to include user sentiment obtained through multimodal information (acoustic, dialogic and textual), in the end-to-end learning framework to make systems more user-adaptive and effective  ...  This work is the first attempt to incorporate multimodal user information in the adaptive end-toend dialog system training framework and attained state-of-the-art performance.  ...  In Table 3 , we listed the dialogic features with their relative importance rank, which were obtained from ranking their feature importance scores in the classifier.  ... 
doi:10.18653/v1/p18-1140 dblp:conf/acl/YuS18 fatcat:g6qahcdmcregvkrvf4b4ntzyhu

Sentiment Adaptive End-to-End Dialog Systems [article]

Weiyan Shi, Zhou Yu
2019 arXiv   pre-print
Therefore, we propose to include user sentiment obtained through multimodal information (acoustic, dialogic and textual), in the end-to-end learning framework to make systems more user-adaptive and effective  ...  This work is the first attempt to incorporate multimodal user information in the adaptive end-to-end dialog system training framework and attained state-of-the-art performance.  ...  In Table 3 , we listed the dialogic features with their relative importance rank, which were obtained from ranking their feature importance scores in the classifier.  ... 
arXiv:1804.10731v3 fatcat:c4ypf2sm2ncflknqakcq6vg62i

An Auto-tuning Framework for Autonomous Vehicles [article]

Haoyang Fan, Zhongpu Xia, Changchun Liu, Yaqin Chen, Qi Kong
2018 arXiv   pre-print
The framework includes a novel rank-based conditional inverse reinforcement learning algorithm, an offline training strategy and an automatic method of collecting and labeling data.  ...  Many autonomous driving motion planners generate trajectories by optimizing a reward/cost functional.  ...  The reward/cost functionals are typically provided by an expert or learned from data via inverse reinforcement learning. c) Inverse reinforcement learning: Inverse reinforcement learning (IRL) learns the  ... 
arXiv:1808.04913v1 fatcat:dlbidbwuwjh4hdo4zgcqwsknvy

Q-Learning versus SVM Study for Green Context-Aware Multimodal ITS Stations

Adel Mounir Said, Emad Abd Elrahman, Hossam Afifi
2018 Advances in Science, Technology and Engineering Systems  
The study compares between Q-Learning and SVM techniques for identifying different variety of routes between two stops as ranked routes from best to lowest based on some traces gathered from some known  ...  (ML) tools like Q-Learning or SVM: Support Vector Machines.  ...  We used a reward based Qlearning approach to choose the best transport means available in multimodal stations.  ... 
doi:10.25046/aj030539 fatcat:mmuzlazmsbgotjcvw3edeqbtbq
« Previous Showing results 1 — 15 out of 5,000 results