325 Hits in 7.6 sec

Shifting the Baseline: Single Modality Performance on Visual Navigation & QA [article]

Jesse Thomason, Daniel Gordon, Yonatan Bisk
2019 arXiv   pre-print
We present unimodal ablations on three recent datasets in visual navigation and QA, seeing an up to 29% absolute gain in performance over published baselines.  ...  when assessing the performance of multimodal techniques.  ...  Acknowledgements This work was supported by NSF IIS-1524371, 1703166, NRI-1637479, IIS-1338054, 1652052, ONR N00014-13-1-0720, and the DARPA CwC program through ARO (W911NF-15-1-0543).  ... 
arXiv:1811.00613v3 fatcat:squyaajibjbfjdkpd53xchibue

Shifting the Baseline: Single Modality Performance on Visual Navigation &

Jesse Thomason, Daniel Gordon, Yonatan Bisk
2019 Proceedings of the 2019 Conference of the North  
We present unimodal ablations on three recent datasets in visual navigation and QA, seeing an up to 29% absolute gain in performance over published baselines.  ...  when assessing the performance of multimodal techniques.  ...  Acknowledgements This work was supported by NSF IIS-1524371, 1703166, NRI-1637479, IIS-1338054, 1652052, ONR N00014-13-1-0720, and the DARPA CwC program through ARO (W911NF-15-1-0543).  ... 
doi:10.18653/v1/n19-1197 dblp:conf/naacl/ThomasonGB19 fatcat:hpymijvtm5emfbxcbxwp6zjmti

WILDS: A Benchmark of in-the-Wild Distribution Shifts [article]

Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michihiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etienne David (+11 others)
2021 arXiv   pre-print
On each dataset, we show that standard training yields substantially lower out-of-distribution than in-distribution performance.  ...  Distribution shifts -- where the training distribution differs from the test distribution -- can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild.  ...  Acknowledgements Many people generously volunteered their time and expertise to advise us on Wilds.  ... 
arXiv:2012.07421v3 fatcat:bsohmukpszajxeadeo25oxmbs4

Vision-and-Dialog Navigation [article]

Jesse Thomason, Michael Murray, Maya Cakmak, Luke Zettlemoyer
2019 arXiv   pre-print
We establish an initial, multi-modal sequence-to-sequence model and demonstrate that looking farther back in the dialog history improves performance.  ...  The Navigator asks questions to their partner, the Oracle, who has privileged access to the best next steps the Navigator should take according to a shortest path planner.  ...  Acknowledgments This research was supported in part by the ARO (W911NF-16-1-0121) and the NSF (IIS-1252835, IIS-1562364). We thank the authors of Anderson et al.  ... 
arXiv:1907.04957v3 fatcat:sjgftfkjxjeopouffdj22feuy4

Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training [article]

Weituo Hao, Chunyuan Li, Xiujun Li, Lawrence Carin, Jianfeng Gao
2020 arXiv   pre-print
Learning to navigate in a visual environment following natural-language instructions is a challenging task, because the multimodal inputs to the agent are highly variable, and the training data on a new  ...  The performance is validated on three VLN tasks. On the Room-to-Room benchmark, our model improves the state-of-the-art from 47% to 51% on success rate weighted by path length.  ...  Shifting Srinivasa. Tactical rewind: Self-correction via backtracking the baseline: Single modality performance on visual naviga- in vision-and-language navigation.  ... 
arXiv:2002.10638v2 fatcat:zcqp4cduyzgrbfycj6hvefjrtm

VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering [article]

Cătălina Cangea, Eugene Belilovsky, Pietro Liò, Aaron Courville
2019 arXiv   pre-print
The desired outcome is that the agent learns to combine capabilities such as scene understanding, navigation and language understanding in order to perform complex reasoning in the visual world.  ...  The goal of this dataset is to assess question-answering performance from nearly-ideal navigation paths, while considering a much more complete variety of questions than current instantiations of the EQA  ...  Acknowledgements We wish to thank Ankesh Anand and Ethan Perez for the useful discussions over the course of this project.  ... 
arXiv:1908.04950v1 fatcat:v7pqiuv5rjbb7nqna7thv33zp4

Embodied Question Answering in Photorealistic Environments with Point Cloud Perception [article]

Erik Wijmans, Samyak Datta, Oleksandr Maksymets, Abhishek Das, Georgia Gkioxari, Stefan Lee, Irfan Essa, Devi Parikh, Dhruv Batra
2019 arXiv   pre-print
We find a novel loss-weighting scheme we call Inflection Weighting to be important when training recurrent models for navigation with behavior cloning and are able to out perform the baselines with this  ...  We find that two seemingly naive navigation baselines, forward-only and random, are strong navigators and challenging to outperform, due to the specific choice of the evaluation setting presented by [1  ...  The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the  ... 
arXiv:1904.03461v1 fatcat:wagq3zdf55gwtnl223kuxbwq2q

A Survey of Embodied AI: From Simulators to Research Tasks [article]

Jiafei Duan, Samson Yu, Hui Li Tan, Hongyuan Zhu, Cheston Tan
2022 arXiv   pre-print
Lastly, this paper surveys the three main research tasks in embodied AI -- visual exploration, visual navigation and embodied question answering (QA), covering the state-of-the-art approaches, evaluation  ...  There has been an emerging paradigm shift from the era of "internet AI" to "embodied AI", where AI algorithms and agents no longer learn from datasets of images, videos or text curated primarily from the  ...  Acknowledgments This research is supported by the Agency for Science, Technology and Research (A*STAR), Singapore under its AME Programmatic Funding Scheme (Award #A18A2b0046) and the National Research  ... 
arXiv:2103.04918v8 fatcat:2zu4klcchbhnvmjej5ry3emu4u

Vidiam: Corpus-based Development of a Dialogue Manager for Multimodal Question Answering [chapter]

Boris van Schooten, Rieks op den Akker
2011 Interactive Multi-modal Question-Answering  
Based on the data, we created a dialogue act typology which helps translate user utterances to practical interactive QA strategies.  ...  We report on the collection and analysis of three QA dialogue corpora, involving textual followup utterances, multimodal follow-up questions, and speech dialogues.  ...  "Yes" means the step is performed but no figures are known; Baseline performance scores are shown between brackets.  ... 
doi:10.1007/978-3-642-17525-1_3 dblp:series/tanlp/SchootenA11 fatcat:6b7h3vzedfewhiixw3l5cfo5ze

Video Question-Answering Techniques, Benchmark Datasets and Evaluation Metrics Leveraging Video Captioning: A Comprehensive Survey

Khushboo Khurana, Umesh Deshpande
2021 IEEE Access  
Video-QA techniques rely on the attention mechanism to generate relevant results.  ...  The presented survey shows that recent works on Memory Networks, Generative Adversarial Networks, and Reinforced Decoders, have the capability to handle the complexities and challenges of video-QA.  ...  This work takes advantage of both textual and visual modalities. 5) DEEP REINFORCEMENT LEARNING Previous methods perform well on videos with a single event.  ... 
doi:10.1109/access.2021.3058248 fatcat:bnjmbffxgreb5jkjuxethaqnde

Quality assurance for image-guided radiation therapy utilizing CT-based technologies: A report of the AAPM TG-179

Jean-Pierre Bissonnette, Peter A. Balter, Lei Dong, Katja M. Langen, D. Michael Lovelock, Moyed Miften, Douglas J. Moseley, Jean Pouliot, Jan-Jakob Sonke, Sua Yoo
2012 Medical Physics (Lancaster)  
The systems described are kilovolt and megavolt cone-beam CT, fan-beam MVCT, and CT-on-rails. A summary of the literature describing current clinical usage is also provided.  ...  Published data from long-term, repeated quality control tests form the basis of the proposed test frequencies and tolerances.  ...  The shifts identified in the measured flexmap are performed automatically by the image-guidance software.  ... 
doi:10.1118/1.3690466 pmid:22482616 fatcat:pg55536advbpna6yupsak4veyq

Theory of carrier phase ambiguity resolution

P. J. G. Teunissen
2003 Wuhan University Journal of Natural Sciences  
in the so-called 'fixed' baseline solution.  ...  Carrier phase ambiguity resolution is the key to high precision Global Navigation Satellite System (GNSS) positioning and navigation.  ...  is not normal, but multi-modal (see On the Quality of the 'Fixed' Baseline In order to describe the quality of the 'fixed' baseline, one would like to know how close one can expect the baseline estimate  ... 
doi:10.1007/bf02899809 fatcat:hm5n2pohwvhlrkpkpfjvwepywm

A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks [article]

Unnat Jain, Luca Weihs, Eric Kolve, Ali Farhadi, Svetlana Lazebnik, Aniruddha Kembhavi, Alexander Schwing
2020 arXiv   pre-print
Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines.  ...  It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities.  ...  We thank Mitchell Wortsman and Kuo-Hao Zeng for their insightful suggestions on how to clarify and structure this work.  ... 
arXiv:2007.04979v1 fatcat:2mbgl55d3feizntuqxg37ktovi

AAPM Spring Clinical Meeting - Abstract

2021 Journal of Applied Clinical Medical Physics  
Results: For a given IGRT system, the radiomic time series showed consistent trends; when a large shift from the baseline was observed for one radiomic feature, corresponding shifts, of varying magnitude  ...  Conclusion: The computer vision mechanical QA system can reproducibly perform mechanical QA tests with high accuracy.  ...  Calculation-based IMRT QA was performed with Mobius3D while measurement-based IMRT QA was performed using a MapCheck 2 device.  ... 
doi:10.1002/acm2.13289 pmid:34002941 fatcat:6oivewssf5gwnnztmhff4nldle

DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue [article]

Hung Le and Chinnadhurai Sankar and Seungwhan Moon and Ahmad Beirami and Alborz Geramifard and Satwik Kottur
2021 arXiv   pre-print
Building such dialogue systems is a challenging problem, involving various reasoning types on both visual and language inputs.  ...  The dataset is designed to contain minimal biases and has detailed annotations for the different types of reasoning over the spatio-temporal space of video.  ...  While image QA problems require a system to learn cross-modality interaction, video QA problems go beyond and capture visual information with temporal variance.  ... 
arXiv:2101.00151v2 fatcat:j4pv54mx3bhd7eyfs5eyzyoyju
« Previous Showing results 1 — 15 out of 325 results