Filters








4,203 Hits in 3.5 sec

Learning Globally Optimized Object Detector via Policy Gradient

Yongming Rao, Dahua Lin, Jiwen Lu, Jie Zhou
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
In this paper, we propose a simple yet effective method to learn globally optimized detector for object detection, which is a simple modification to the standard cross-entropy gradient inspired by the  ...  Benefiting from more precise gradients produced by the global optimization method, our framework significantly improves state-of-the-art object detectors.  ...  Conclusion In this paper, we have presented a new method for learning globally optimized object detector via policy gradient, which is a simple yet effective modification to standard cross-entropy gradient  ... 
doi:10.1109/cvpr.2018.00648 dblp:conf/cvpr/RaoLL018 fatcat:mpdjww53ujcw5fzewtk6u4wlse

Refactoring Policy for Compositional Generalizability using Self-Supervised Object Proposals [article]

Tongzhou Mu, Jiayuan Gu, Zhiwei Jia, Hao Tang, Hao Su
2020 arXiv   pre-print
Particularly, we implement an object-centric GNN-based student policy, whose input objects are learned from images through self-supervised learning.  ...  We study how to learn a policy with compositional generalizability.  ...  The initial learning rate is 0.001, and is divided by 2 every 100K gradient updates. The network is trained with the Adam optimizer for 200K gradient updates.  ... 
arXiv:2011.00971v1 fatcat:yzjitqgs2fdhhn6ve7qdlke2p4

Metareasoning in Modular Software Systems: On-the-Fly Configuration Using Reinforcement Learning with Rich Contextual Representations

Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
The challenge of doing system-wide optimization is a combinatorial problem.  ...  We show significant improvement in both real-world and synthetic pipelines across a variety of reinforcement learning techniques.  ...  A common approach to optimize the policy parameters θ is to directly perform stochastic gradient descent on the average loss, which results in the policy gradient algorithm.  ... 
doi:10.1609/aaai.v34i04.5965 fatcat:dbuvrlaoh5hsxbhpewix5owwku

Collaborative Deep Reinforcement Learning for Joint Object Search

Xiangyu Kong, Bo Xin, Yizhou Wang, Gang Hua
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
By treating each detector as an agent, we present the first collaborative multiagent deep reinforcement learning algorithm to learn the optimal policy for joint active object localization, which effectively  ...  We verify our proposed method on multiple object detection benchmarks.  ...  By treating each detector as an agent, we present the first collaborative multiagent deep reinforcement learning method that effectively learns the optimal policy for joint active object localization.  ... 
doi:10.1109/cvpr.2017.748 dblp:conf/cvpr/KongXWH17 fatcat:6axmek7vejdddj3h65xge3t6u4

Multiobject Tracking in Videos Based on LSTM and Deep Reinforcement Learning

Ming-xin Jiang, Chao Deng, Zhi-geng Pan, Lan-fang Wang, Xing Sun
2018 Complexity  
Firstly, the multiple objects are detected by the object detector YOLO V2.  ...  Finally, we conduct a data association using LSTM for each frame between the results of the object detector and the results of single-object trackers.  ...  can find a global optimal assignment, and the single-object trackers are able to find the location of the object via deep reinforcement learning.  ... 
doi:10.1155/2018/4695890 fatcat:4o5cshsrkng4jl4hujdmqchaai

Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations [article]

Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz
2019 arXiv   pre-print
The challenge of doing system-wide optimization is a combinatorial problem.  ...  We show significant improvement in both real-world and synthetic pipelines across a variety of reinforcement learning techniques.  ...  A common approach to optimize the policy parameters θ is to directly perform stochastic gradient descent on the average loss, which results in the policy gradient algorithm.  ... 
arXiv:1905.05179v1 fatcat:ddghkmntj5hwxkwhq6eykkbw4q

RCAA: Relational Context-Aware Agents for Person Search [chapter]

Xiaojun Chang, Po-Yao Huang, Yi-Dong Shen, Xiaodan Liang, Yi Yang, Alexander G. Hauptmann
2018 Lecture Notes in Computer Science  
It is also worth noting that the proposed model even performs better than traditional methods with perfect pedestrian detectors.  ...  In this paper, we address this problem by training relational context-aware agents which learn the actions to localize the target person from the gallery of whole scene images.  ...  Acknowledgements This work was supported in part by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior/ Interior Business Center (DOI/IBC) contract number D17PC00340  ... 
doi:10.1007/978-3-030-01240-3_6 fatcat:c7pbfawz6ne7nbnpesnmh3tqcy

Collaborative Deep Reinforcement Learning for Joint Object Search [article]

Xiangyu Kong, Bo Xin, Yizhou Wang, Gang Hua
2017 arXiv   pre-print
By treating each detector as an agent, we present the first collaborative multi-agent deep reinforcement learning algorithm to learn the optimal policy for joint active object localization, which effectively  ...  We verify our proposed method on multiple object detection benchmarks.  ...  By treating each detector as an agent, we present the first collaborative multiagent deep reinforcement learning method that effectively learns the optimal policy for joint active object localization.  ... 
arXiv:1702.05573v1 fatcat:agtduezvm5bqlobykfk5c37k3a

From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following [article]

Justin Fu, Anoop Korattikara, Sergey Levine, Sergio Guadarrama
2019 arXiv   pre-print
language-conditioned policies to new environments.  ...  language-conditioned policy leads to poor performance.  ...  The goal of "forward" reinforcement learning is to find the optimal policy π * .  ... 
arXiv:1902.07742v1 fatcat:6tjvjqd5vvaezertlmp2fh3oi4

Efficient Object Detection in Large Images using Deep Reinforcement Learning [article]

Burak Uzkent, Christopher Yeh, Stefano Ermon
2020 arXiv   pre-print
resolution images to be run through a fine level detector when it is dominated by small objects.  ...  Traditionally, an object detector is applied to every part of the scene of interest, and its accuracy and computational cost increases with higher resolution images.  ...  To optimize the parameters θ c p , θ f p of f c p and f f p , we need to use model-free reinforcement learning algorithms such as Q-learning [51] and policy gradient [43] .  ... 
arXiv:1912.03966v2 fatcat:ofj5mmhpwnaj5nwt6buyr7a7vi

Avoiding Tampering Incentives in Deep RL via Decoupled Approval [article]

Jonathan Uesato, Ramana Kumar, Victoria Krakovna, Tom Everitt, Richard Ngo, Shane Legg
2020 arXiv   pre-print
How can we design agents that pursue a given objective when all feedback mechanisms are influenceable by the agent?  ...  We present a principled solution to the problem of learning from influenceable feedback, which combines approval with a decoupled feedback collection procedure.  ...  An advantage of reward-based policy gradient methods is that policies are optimized directly for Q π .  ... 
arXiv:2011.08827v1 fatcat:tuxagwxz5jh5znopncyixjui5i

Tree-Structured Reinforcement Learning for Sequential Object Localization [article]

Zequn Jie, Xiaodan Liang, Jiashi Feng, Xiaojie Jin, Wen Feng Lu and Shuicheng Yan
2017 arXiv   pre-print
Allowing multiple near-optimal policies, Tree-RL offers more diversity in search paths and is able to find multiple objects with a single feed-forward pass.  ...  To incorporate global interdependency between objects into object localization, we propose an effective Tree-structured Reinforcement Learning (Tree-RL) approach to sequentially search for objects by fully  ...  [19] learned an optimal policy to localize a single object through deep Q-learning.  ... 
arXiv:1703.02710v1 fatcat:he7f3lx2ujh6vaunqzajhqlgnu

Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos

Hou-Ning Hu, Yen-Chen Lin, Ming-Yu Liu, Hsien-Tzu Cheng, Yung-Ju Chang, Min Sun
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
We propose to directly learn an online policy of the agent from data.  ...  Specifically, we leverage a state-of-the-art object detector to propose a few candidate objects of interest (yellow boxes in Fig. 1) .  ...  Learning. We optimize our model using stochastic gradients with batch size = 10 and maximum epochs = 400.  ... 
doi:10.1109/cvpr.2017.153 dblp:conf/cvpr/HuLLCCS17 fatcat:lor6eecabjbarecc7dvvd5u4oa

Counterfactual Critic Multi-Agent Training for Scene Graph Generation [article]

Long Chen, Hanwang Zhang, Jun Xiao, Xiangnan He, Shiliang Pu, Shih-Fu Chang
2019 arXiv   pre-print
CMAT is a multi-agent policy gradient method that frames objects as cooperative agents, and then directly maximizes a graph-level metric as the reward.  ...  However, we argue that the scene dynamics is not properly learned by using the prevailing cross-entropy based supervised learning paradigm, which is not sensitive to graph inconsistency: errors at the  ...  Multi-Agent Policy Gradient. Policy gradient is a type of reinforcement learning method which can optimize nondifferentiable objective.  ... 
arXiv:1812.02347v3 fatcat:ggiv3msx6jcvrp52dghfv5dkcm

Unsupervised Learning of Visual 3D Keypoints for Control [article]

Boyuan Chen, Pieter Abbeel, Deepak Pathak
2021 arXiv   pre-print
The input images are embedded into latent 3D keypoints via a differentiable encoder which is trained to optimize both a multi-view consistency loss and downstream task objective.  ...  Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations.  ...  During policy gradient update, we optimize all parameters with the sum of policy loss L policy and unsupervised learning loss L unsup .  ... 
arXiv:2106.07643v1 fatcat:5vmjphexcbeydjixga245sr764
« Previous Showing results 1 — 15 out of 4,203 results