Filters








8,553 Hits in 4.8 sec

Autonomously Generating Hints by Inferring Problem Solving Policies

Chris Piech, Mehran Sahami, Jonathan Huang, Leonidas Guibas
2015 Proceedings of the Second (2015) ACM Conference on Learning @ Scale - L@S '15  
In this paper we autonomously generate hints for the Code.org 'Hour of Code,' (which is to the best of our knowledge the largest online course to date) using historical student data.  ...  Such predictions can form the basis for effective hint generation systems.  ...  Chris is supported by NSF-GRFP grant number DGE-114747.  ... 
doi:10.1145/2724660.2724668 dblp:conf/lats/PiechSHG15 fatcat:lhqhyalz6vg6hla3glwv2us22y

Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation [article]

Maximilian Igl, Daewoo Kim, Alex Kuefler, Paul Mougin, Punit Shah, Kyriacos Shiarlis, Dragomir Anguelov, Mark Palatucci, Brandyn White, Shimon Whiteson
2022 arXiv   pre-print
To address this problem, we propose Symphony, which greatly improves realism by combining conventional policies with a parallel beam search.  ...  The beam search refines these policies on the fly by pruning branches that are unfavourably evaluated by a discriminator.  ...  INTRODUCTION Simulation is a crucial tool for accelerating the development of autonomous driving software because it can generate adversarial interactions for training autonomous driving policies, play  ... 
arXiv:2205.03195v1 fatcat:422oic2qjzcdxkh2224byebjhu

An architectural design and evaluation of an affective tutoring system for novice programmers

Hua Leong Fwa
2018 International Journal of Educational Technology in Higher Education  
more effective tutoring as compared to the version with the affective function disabled and the students are positive on their learning experience with the ATS with the fill in the gap exercises and hints  ...  Both quantitative and qualitative techniques were used for evaluation of the effectiveness of the ATS and its usability and acceptance by student participants.  ...  The first instance when students request for hint, a generic hint relating to the topic would be displayed. For subsequent help requests, detailed hints would be provided to the students.  ... 
doi:10.1186/s41239-018-0121-2 fatcat:27lqzzixpbbz7e67xbzu7wqblu

Collaborative Driving: Learning- Aided Joint Topology Formulation and Beamforming [article]

Yao Zhang, Changle Li, Tom H. Luan, Chau Yuen Yuchuan Fu
2022 arXiv   pre-print
Currently, autonomous vehicles are able to drive more naturally based on the driving policies learned from millions of driving miles in real environments.  ...  Finally, we discuss several potential open research problems for the proposed collaborative driving scheme.  ...  Acknowledgments This work was supported by the National Key R&D Program of China (2019YFB1600100), National Natural Science Foundation of China (U1801266 and 62101401), the Youth Innovation Team of Shaanxi  ... 
arXiv:2203.09915v1 fatcat:p3iwevz4urc6dlr7n5iyahnziu

Inverse Reinforce Learning with Nonparametric Behavior Clustering [article]

Siddharthan Rajasekaran, Jinwei Zhang, Jie Fu
2017 arXiv   pre-print
Further, to improve the computation efficiency, we remove the need of completely solving multiple IRL problems for multiple clusters during the iteration steps and introduce a resampling technique to avoid  ...  from autonomous robot cars using the Gazebo robot simulator.  ...  Let T be the set of trajectories generated by the Markov chain M π where π is the maximum entropy policy, probability of a trajectory ζ under this policy is given by, P (ζ|θ) = exp(r θ (ζ)) τ ∈T exp(r  ... 
arXiv:1712.05514v1 fatcat:rvdvcwnoojf33px47amedqta7q

A Formal Framework for Trust Policy Negotiation in Autonomic Systems: Abduction with Soft Constraints [chapter]

Stefano Bistarelli, Fabio Martinelli, Francesco Santini
2010 Lecture Notes in Computer Science  
As a running application example throughout the paper, we reason with access control policies and credentials.  ...  In this way, we can associate the level of preference defined by the "softness" of the constraint with a "level" of trust.  ...  This problem is of interest when performing type inference involving generalized algebraic data types.  ... 
doi:10.1007/978-3-642-16576-4_20 fatcat:plikin6mbbc7vnieeou77kyxua

Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning

Ignacio Carlucho, Mariano De Paula, Sen Wang, Yvan Petillot, Gerardo G. Acosta
2018 Robotics and Autonomous Systems  
complex control problems for autonomous systems.  ...  Low-level control of autonomous underwater vehicles (AUVs) has been extensively addressed by classical control techniques.  ...  In addition, an actor-critic goal-oriented architecture was developed to aid the deep agent to achieve a more generalized policy and therefore solve a bigger range of dynamic problems.  ... 
doi:10.1016/j.robot.2018.05.016 fatcat:jtlf3pofpbgj5p32rpn6tevary

ViKiNG: Vision-Based Kilometer-Scale Navigation with Geographic Hints [article]

Dhruv Shah, Sergey Levine
2022 arXiv   pre-print
Robotic navigation has been approached as a problem of 3D reconstruction and planning, as well as an end-to-end learning problem.  ...  These models are used by a heuristic planner to identify the best waypoint in order to reach the final destination.  ...  ACKNOWLEDGMENTS This research was partially supported by DARPA Assured Autonomy, ARL DCIST CRA W911NF-17-2-0181, and DARPA RACER.  ... 
arXiv:2202.11271v2 fatcat:a3e3mi6ffzgdpdmsu3ss5x2blu

A globally optimal algorithm for TTD-MDPs

Sooraj Bhat, David L. Roberts, Mark J. Nelson, Charles L. Isbell, Michael Mateas
2007 Proceedings of the 6th international joint conference on Autonomous agents and multiagent systems - AAMAS '07  
We improve on the existing algorithm for solving TTD-MDPs by deriving a greedy algorithm that finds a policy that provably minimizes the global KL-divergence from the target distribution.  ...  the use of Targeted Trajectory Distribution Markov Decision Processes (TTD-MDPs)-a variant of MDPs in which the goal is to realize a specified distribution of trajectories through a state space-as a general  ...  ORISE is managed by Oak Ridge Associated Universities under DOE contract number DE-AC05-06OR23100.  ... 
doi:10.1145/1329125.1329367 dblp:conf/atal/BhatRNIM07 fatcat:frohevtkxbbupa67tdv45b2mri

What does my knowing your plans tell me? [article]

Yulin Zhang and Dylan A. Shell and Jason M. O'Kane
2018 arXiv   pre-print
Privacy constraints are specified as the stipulations on what can be inferred during plan execution.  ...  The divulged plan, which can be represented by a procrustean graph, is shown to undermine privacy precisely to the extent that it can eliminate action-observation sequences that will never appear in the  ...  ACKNOWLEDGEMENTS This work was supported by the NSF through awards IIS-1453652, IIS-1527436, and IIS-1526862. We thank the anonymous reviewers for their time and valuable comments.  ... 
arXiv:1810.03873v1 fatcat:qxy2ko6fzbalpadr2hcvaal5ba

Towards Teachable Autotelic Agents [article]

Olivier Sigaud and Ahmed Akakzia and Hugo Caselles-Dupré and Cédric Colas and Pierre-Yves Oudeyer and Mohamed Chetouani
2022 arXiv   pre-print
It also shows the way forward by highlighting key research directions towards the design or autonomous agents that can be taught by ordinary people via natural pedagogy.  ...  In the field of Artificial Intelligence, these extremes respectively map to autonomous agents learning from their own signals and interactive learning agents fully taught by their teachers.  ...  Teachable versus Autonomous Agents Reinforcement learning (RL) is a process by which an agent learns to solve sequential decision problems from a reward signal Sutton et al. (1998) .  ... 
arXiv:2105.11977v2 fatcat:w37mjxeaafefnko3a64on7l3uu

Simulating SQL injection vulnerability exploitation using Q-learning reinforcement learning agents

László Erdődi, Åvald Åslaugson Sommervoll, Fabio Massimo Zennaro
2021 Journal of Information Security and Applications  
solve an individual challenge but a more generic policy that may be applied to perform SQL injection attacks against any system instantiated randomly by our problem generator.  ...  We consider a simplification of the dynamics of SQL injection attacks by casting this problem as a security capturethe-flag challenge.  ...  Literature overview Machine learning has recently found application in many fields in order to solve problems via induction and inference, including security [12] .  ... 
doi:10.1016/j.jisa.2021.102903 fatcat:ku74n2vm6jc7nkza3cfzoa4lfe

Shared Autonomy via Deep Reinforcement Learning

Siddharth Reddy, Anca Dragan, Sergey Levine
2018 Robotics: Science and Systems XIV  
In shared autonomy, user input is combined with semi-autonomous control to achieve a common goal.  ...  We balance these two needs by discarding actions whose values fall below some threshold, then selecting the remaining action closest to the user's input.  ...  ACKNOWLEDGEMENTS We would like to thank Oleg Klimov for open-sourcing his implementation of the Lunar Lander game, which was originally developed by Atari in 1979.  ... 
doi:10.15607/rss.2018.xiv.005 dblp:conf/rss/ReddyDL18 fatcat:pd5wcvxrn5f5tayv7zyjhe54l4

Learning structured reactive navigation plans from executing MDP navigation policies

Michael Beetz, Thorsten Belker
2001 Proceedings of the fifth international conference on Autonomous agents - AGENTS '01  
XFRMLEARN is implemented and extensively evaluated on an autonomous mobile robot.  ...  Concurrent plans are represented in a transparent and modular form so that automatic planning techniques can make inferences about them and revise them.  ...  The research reported in this paper is partly funded by the Deutsche Forschungsgemeinschaft (DFG) under contract number BE 2200/3-1.  ... 
doi:10.1145/375735.375795 dblp:conf/agents/BeetzB01 fatcat:ovln7czxjvd7bbcuxfhdxmxrzy

Learning sequential tasks interactively from demonstrations and own experience

Kathrin Grave, Sven Behnke
2013 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems  
Using a Gaussian Process approximation of the state-action sequence value function, our approach generalizes values observed from demonstrated and autonomously generated action sequences to unknown inputs  ...  In this paper, we propose an intuitive learning method for a robot to acquire sequences of motions by combining learning from human demonstrations and reinforcement learning.  ...  Generalization is performed by analyzing multiple demonstrations of the same task, or by interactively asking the teacher to relax preconditions by hinting at irrelevant task features.  ... 
doi:10.1109/iros.2013.6696816 dblp:conf/iros/GraveB13 fatcat:cjqqkkr6s5enlb7i6kvn6w5lqm
« Previous Showing results 1 — 15 out of 8,553 results