523 Hits in 2.1 sec

Interpretable Low-Dimensional Regression via Data-Adaptive Smoothing [article]

Wesley Tansey, Jesse Thomason, James G. Scott
2017 arXiv   pre-print
We consider the problem of estimating a regression function in the common situation where the number of features is small, where interpretability of the model is a high priority, and where simple linear or additive models fail to provide adequate performance. To address this problem, we present Maximum Variance Total Variation denoising (MVTV), an approach that is conceptually related both to CART and to the more recent CRISP algorithm, a state-of-the-art alternative method for interpretable
more » ... linear regression. MVTV divides the feature space into blocks of constant value and fits the value of all blocks jointly via a convex optimization routine. Our method is fully data-adaptive, in that it incorporates highly robust routines for tuning all hyperparameters automatically. We compare our approach against CART and CRISP via both a complexity-accuracy tradeoff metric and a human study, demonstrating that that MVTV is a more powerful and interpretable method.
arXiv:1708.01947v1 fatcat:qiy7pqhwz5dzjavj5wz6552fqm

The RobotSlang Benchmark: Dialog-guided Robot Localization and Navigation [article]

Shurjo Banerjee, Jesse Thomason, Jason J. Corso
2020 arXiv   pre-print
Autonomous robot systems for applications from search and rescue to assistive guidance should be able to engage in natural language dialog with people. To study such cooperative communication, we introduce Robot Simultaneous Localization and Mapping with Natural Language (RobotSlang), a benchmark of 169 natural language dialogs between a human Driver controlling a robot and a human Commander providing guidance towards navigation goals. In each trial, the pair first cooperates to localize the
more » ... ot on a global map visible to the Commander, then the Driver follows Commander instructions to move the robot to a sequence of target objects. We introduce a Localization from Dialog History (LDH) and a Navigation from Dialog History (NDH) task where a learned agent is given dialog and visual observations from the robot platform as input and must localize in the global map or navigate towards the next target object, respectively. RobotSlang is comprised of nearly 5k utterances and over 1k minutes of robot camera and control streams. We present an initial model for the NDH task, and show that an agent trained in simulation can follow the RobotSlang dialog-based navigation instructions for controlling a physical robot platform. Code and data are available at
arXiv:2010.12639v1 fatcat:k53mcuuvmrghhivcmcyx63qiea

Language Grounding with 3D Objects [article]

Jesse Thomason, Mohit Shridhar, Yonatan Bisk, Chris Paxton, Luke Zettlemoyer
2021 arXiv   pre-print
Seemingly simple natural language requests to a robot are generally underspecified, for example "Can you bring me the wireless mouse?" Flat images of candidate mice may not provide the discriminative information needed for "wireless." The world, and objects in it, are not flat images but complex 3D shapes. If a human requests an object based on any of its basic properties, such as color, shape, or texture, robots should perform the necessary exploration to accomplish the task. In particular,
more » ... le substantial effort and progress has been made on understanding explicitly visual attributes like color and category, comparatively little progress has been made on understanding language about shapes and contours. In this work, we introduce a novel reasoning task that targets both visual and non-visual language about 3D objects. Our new benchmark, ShapeNet Annotated with Referring Expressions (SNARE) requires a model to choose which of two objects is being referenced by a natural language description. We introduce several CLIP-based models for distinguishing objects and demonstrate that while recent advances in jointly modeling vision and language are useful for robotic language understanding, it is still the case that these image-based models are weaker at understanding the 3D nature of objects -- properties which play a key role in manipulation. We find that adding view estimation to language grounding models improves accuracy on both SNARE and when identifying objects referred to in language on a robot platform, but note that a large gap remains between these models and human performance.
arXiv:2107.12514v2 fatcat:hth2wl7xtnaojdhksfalh37gfe

Vision-and-Dialog Navigation [article]

Jesse Thomason, Michael Murray, Maya Cakmak, Luke Zettlemoyer
2019 arXiv   pre-print
Further, Thomason et al.  ... 
arXiv:1907.04957v3 fatcat:sjgftfkjxjeopouffdj22feuy4

Prospection: Interpretable Plans From Language By Predicting the Future [article]

Chris Paxton, Yonatan Bisk, Jesse Thomason, Arunkumar Byravan, Dieter Fox
2019 arXiv   pre-print
High-level human instructions often correspond to behaviors with multiple implicit steps. In order for robots to be useful in the real world, they must be able to to reason over both motions and intermediate goals implied by human instructions. In this work, we propose a framework for learning representations that convert from a natural-language command to a sequence of intermediate goals for execution on a robot. A key feature of this framework is prospection, training an agent not just to
more » ... ectly execute the prescribed command, but to predict a horizon of consequences of an action before taking it. We demonstrate the fidelity of plans generated by our framework when interpreting real, crowd-sourced natural language commands for a robot in simulated scenes.
arXiv:1903.08309v1 fatcat:hcrwafxdzzg4xegwnmqf6w3aiy

Prosodic Entrainment and Tutoring Dialogue Success [chapter]

Jesse Thomason, Huy V. Nguyen, Diane Litman
2013 Lecture Notes in Computer Science  
This study investigates the relationships between student entrainment to a tutoring dialogue system and learning. By finding the features of prosodic entrainment which correlate with learning, we hope to inform educational dialogue systems aiming to leverage entrainment. We propose a novel method to measure prosodic entrainment and find specific features which correlate with user learning. We also find differences in user entrainment with respect to tutor voice and user gender.
doi:10.1007/978-3-642-39112-5_104 fatcat:ojckt567dzhd3k4k743c7rxa3m

Improving Robot Success Detection using Static Object Data [article]

Rosario Scalise, Jesse Thomason, Yonatan Bisk, Siddhartha Srinivasa
2019 arXiv   pre-print
.10 .74 ± .07 .59 ± .08 pre pre .77 ± .05 .59 ± .06 Baseline (MC) .20 ± .00 .32 ± .00 Baseline (Rand) All code and data for reproducing our results are available at  ... 
arXiv:1904.01650v2 fatcat:tzvqfp4xhfcgdcxkut2mz2dtwm

Multi-Modal Word Synset Induction

Jesse Thomason, Raymond J. Mooney
2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence  
embeddings trained over this development text associated with synsets of V , but achieved better performance from simple LSA, possibly due to the small size of the development corpus. 3  ...  Other recent work has used the VGG network to extract visual features from objects [Thomason et al., 2016] , for developing similarity metrics within ImageNet [Deselaers and Ferrari, 2011] , and for  ... 
doi:10.24963/ijcai.2017/575 dblp:conf/ijcai/ThomasonM17 fatcat:xea3e5jodramlfq5tchotqwxba

RMM: A Recursive Mental Model for Dialog Navigation [article]

Homero Roman Roman, Yonatan Bisk, Jesse Thomason, Asli Celikyilmaz, Jianfeng Gao
2020 arXiv   pre-print
No prior work has tackled generating navigator questions (C2) Anderson et al. (2018) Fried et al. (2018) Narayan-Chen et al. (2019) Nguyen and Daumé III (2019) Chi et al. (2020) Thomason et al. (2019)  ... 
arXiv:2005.00728v2 fatcat:5aczk2ips5alnjrsinprkd53wm

Interpreting Black Box Models via Hypothesis Testing [article]

Collin Burns, Jesse Thomason, Wesley Tansey
2019 arXiv   pre-print
While many methods for interpreting machine learning models have been proposed, they are often ad hoc, difficult to interpret, and come with limited guarantees. This is especially problematic in science and medicine, where model interpretations may be reported as discoveries or guide patient treatments. As a step toward more principled and reliable interpretations, in this paper we reframe black box model interpretability as a multiple hypothesis testing problem. The task is to discover
more » ... nt" features by testing whether the model prediction is significantly different from what would be expected if the features were replaced with uninformative counterfactuals. We propose two testing methods: one that provably controls the false discovery rate but which is not yet feasible for large-scale applications, and an approximate testing method which can be applied to real-world data sets. In simulation, both tests have high power relative to existing interpretability methods. When applied to state-of-the-art vision and language models, the framework selects features that intuitively explain model predictions. The resulting explanations have the additional advantage that they are themselves easy to interpret.
arXiv:1904.00045v2 fatcat:akemves4kfdg3kxal3q47zf7ue

Improving Grounded Natural Language Understanding through Human-Robot Dialog [article]

Jesse Thomason, Aishwarya Padmakumar, Jivko Sinapov, Nick Walker, Yuqian Jiang, Harel Yedidsion, Justin Hart, Peter Stone, Raymond J. Mooney
2019 arXiv   pre-print
dialog agent that fulfills requests in natural language. 2 2 The source code for this dialog agent, as well as the deployments described in the following section, can be found at  ... 
arXiv:1903.00122v1 fatcat:lomr6crd5rahtobryhf6kpoiiq

Experience Grounds Language [article]

Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian
2020 arXiv   pre-print
., 2014; Thomason et al., 2016) and abstract, to concepts like heavy and soft.  ...  Thomason et al., 2019b; Shridhar et al., 2020) , or the real world (Tellex et al., 2011; Matuszek, 2018; Tellex et al., 2020) must translate from language to action.  ... 
arXiv:2004.10151v3 fatcat:adz5yefshjbzhmypcr7anigu5a

Augmenting Knowledge through Statistical, Goal-oriented Human-Robot Dialog [article]

Saeid Amiri, Sujay Bajracharya, Cihangir Goktolga, Jesse Thomason, and Shiqi Zhang
2019 arXiv   pre-print
It is possible that the 1 Amiri, Goktolga, and Zhang are with SUNY Binghamton; 2 Bajracharya is with Cleveland State University 3 Thomason is with the University of Washington  ... 
arXiv:1907.03390v2 fatcat:h2e5uybukvbr7ca4vdq6gci3zm

Shifting the Baseline: Single Modality Performance on Visual Navigation &

Jesse Thomason, Daniel Gordon, Yonatan Bisk
2019 Proceedings of the 2019 Conference of the North  
We demonstrate the surprising strength of unimodal baselines in multimodal domains, and make concrete recommendations for best practices in future research. Where existing work often compares against random or majority class baselines, we argue that unimodal approaches better capture and reflect dataset biases and therefore provide an important comparison when assessing the performance of multimodal techniques. We present unimodal ablations on three recent datasets in visual navigation and QA,
more » ... eeing an up to 29% absolute gain in performance over published baselines.
doi:10.18653/v1/n19-1197 dblp:conf/naacl/ThomasonGB19 fatcat:hpymijvtm5emfbxcbxwp6zjmti

Guiding Interaction Behaviors for Multi-modal Grounded Language Learning

Jesse Thomason, Jivko Sinapov, Raymond Mooney
2017 Proceedings of the First Workshop on Language Grounding for Robotics  
., 2017) , and multimodal (Thomason et al., 2016) spaces.  ...  Past work has used human-robot interaction to gather language predicate labels for objects in the world (Parde et al., 2015; Thomason et al., 2016) .  ... 
doi:10.18653/v1/w17-2803 dblp:conf/acl/ThomasonSM17 fatcat:jiiw4ip23vbz5cdsbiq2kp352e
« Previous Showing results 1 — 15 out of 523 results