Filters








625 Hits in 11.9 sec

Images Don't Lie: Transferring Deep Visual Semantic Features to Large-Scale Multimodal Learning to Rank [article]

Corey Lynch, Kamelia Aryafar, Josh Attenberg
2015 arXiv   pre-print
In this paper, we introduce a multimodal learning to rank model that combines these traditional features with visual semantic features transferred from a deep convolutional neural network.  ...  In a large scale experiment using data from the online marketplace Etsy, we verify that moving to a multimodal representation significantly improves ranking quality.  ...  Acknowledgments The authors would like to thank Arjun Raj Rajanna, Christopher Kanan, Robert Hall, and Will Gallego for fruitful discussions during the course of this paper.  ... 
arXiv:1511.06746v1 fatcat:dhwzfckyb5e67f7qxmzsty7tee

Cross-modal Common Representation Learning by Hybrid Transfer Network [article]

Xin Huang, Yuxin Peng, Mingkuan Yuan
2017 arXiv   pre-print
In single-modal scenario, similar problem is usually relieved by transferring knowledge from large-scale auxiliary datasets (as ImageNet).  ...  However, it is challenging to transfer useful knowledge from single-modal (as image) source domain to cross-modal (as image/text) target domain.  ...  al., 2012] , which is from ImageNet large-scale visual recognition challenge (ILSVRC) 2012.  ... 
arXiv:1706.00153v2 fatcat:mrbbnvji3zdyjafae6mxmcg4ni

Cross-modal Common Representation Learning by Hybrid Transfer Network

Xin Huang, Yuxin Peng, Mingkuan Yuan
2017 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence  
In single-modal scenario, similar problem is usually relieved by transferring knowledge from large-scale auxiliary datasets (as ImageNet).  ...  However, it is challenging to transfer useful knowledge from single-modal (as image) source domain to cross-modal (as image/text) target domain.  ...  al., 2012] , which is from ImageNet large-scale visual recognition challenge (ILSVRC) 2012.  ... 
doi:10.24963/ijcai.2017/263 dblp:conf/ijcai/HuangPY17 fatcat:c2u5yzhbcrb7nlwuwuhbbgsnwi

Integrating Scene Text and Visual Appearance for Fine-Grained Image Classification [article]

Xiang Bai, Mingkun Yang, Pengyuan Lyu, Yongchao Xu, Jiebo Luo
2017 arXiv   pre-print
Text in natural images contains rich semantics that are often highly relevant to objects or scene. In this paper, we focus on the problem of fully exploiting scene text for visual understanding.  ...  The main idea is combining word representations and deep visual features into a globally trainable deep convolutional neural network.  ...  For example, in [16] , the authors adopted a deep ranking model to learn similarity metric directly from images.  ... 
arXiv:1704.04613v2 fatcat:njyzdagkondknmvytbgshm6mhu

A Birds Eye View on Knowledge Graph Embeddings, Software Libraries, Applications and Challenges [article]

Satvik Garg, Dwaipayan Roy
2022 arXiv   pre-print
In addition, reinforcement learning techniques are reviewed to model complex queries as a link prediction problem.  ...  We discuss existing KGC approaches, including the state-of-the-art Knowledge Graph Embeddings (KGE), not only on static graphs but also for the latest trends such as multimodal, temporal, and uncertain  ...  This reasoning cycle only aims to learn the features as boundaries present in the KG. In simple terms, it cannot be further transferred to some different KG.  ... 
arXiv:2205.09088v1 fatcat:c4gfzg4ldras3axpf5wvbldstm

Human Action Recognition and Prediction: A Survey [article]

Yu Kong, Yun Fu
2022 arXiv   pre-print
Derived from rapid advances in computer vision and machine learning, video analysis tasks have been moving from inferring the present state to predicting the future state.  ...  These two tasks have become particularly prevalent topics recently because of their explosively emerging real-world applications, such as visual surveillance, autonomous driving vehicle, entertainment,  ...  The 3D ConvNet [89, 90] was later extended to a modern deep architecture called C3D [255] that learns on large-scale datasets.  ... 
arXiv:1806.11230v3 fatcat:2a2d7fuezbdqzfgrjwkcuqvmbu

Multi-Modal 3D Object Detection in Autonomous Driving: a Survey [article]

Yingjie Wang, Qiuyu Mao, Hanqi Zhu, Yu Zhang, Jianmin Ji, Yanyong Zhang
2021 arXiv   pre-print
To bridge this gap and motivate future research, this survey devotes to review recent fusion-based 3D detection deep learning models that leverage multiple sensor data sources, especially cameras and LiDARs  ...  However, achieving full autonomy remains a daunting task due to the complex and dynamic driving environment.  ...  During the deep learning-based image processing, deep neural networks take 2D images as input and extracts feature maps with a set of convolution operations (Kim, 2014; Yosinski et al., 2014) .  ... 
arXiv:2106.12735v2 fatcat:5twzbk4yhrcfzddp7zghnsivna

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions [article]

Yang Wu, Dingheng Wang, Xiaotong Lu, Fan Yang, Guoqi Li, Weisheng Dong, Jianbo Shi
2021 arXiv   pre-print
Deep neural networks (DNNs) have largely boosted their performances on many concrete tasks, with the help of large amounts of training data and new powerful computation resources.  ...  This paper attempts to provide a systematic summary via a comprehensive survey which can serve as a valuable reference and inspire both researchers and practitioners who work on visual recognition problems  ...  , especially orienting to visual recognition neural networks which are always large-scaled due to high-dimensional raw visual data.  ... 
arXiv:2108.13055v2 fatcat:nf3lymdbvzgl7otl7gjkk5qitq

Visual and Object Geo-localization: A Comprehensive Survey [article]

Daniel Wilson, Xiaohan Zhang, Waqas Sultani, Safwan Wshah
2021 arXiv   pre-print
As massive datasets of GPS tagged media have rapidly become available due to smartphones and the internet, and deep learning has risen to enhance the performance capabilities of machine learning models  ...  , the fields of visual and object geo-localization have emerged due to its significant impact on a wide range of applications such as augmented reality, robotics, self-driving vehicles, road maintenance  ...  The proposed model authors proposed two deep learning methods to geolocalize was end-to-end trainable using a weakly supervised ranking images.  ... 
arXiv:2112.15202v1 fatcat:ipwas72ro5ho5fjiakm6de7ji4

Measuring the Biases and Effectiveness of Content-Style Disentanglement [article]

Xiao Liu, Spyridon Thermos, Gabriele Valvano, Agisilaos Chartsias, Alison O'Neil, Sotirios A. Tsaftaris
2021 arXiv   pre-print
in spatially equivariant tasks (e.g. image-to-image translation).  ...  To achieve this, they employ different model design, learning objective, and data biases.  ...  We use DeepFashion [39], a large-scale dataset with over 800,000 diverse images of people in different poses and clothing, that also has annotations of body joints.  ... 
arXiv:2008.12378v4 fatcat:mqo5s4gf2jbgrbt55vkqniujpa

Visual and semantic representations predict subsequent memory in perceptual and conceptual memory tests [article]

Simon W Davis, Benjamin Geib, Erik Wing, Wei-Chun Wang, Mariam Hovhannisyan, Zachary Monge, Roberto Cabeza
2020 bioRxiv   pre-print
Three levels of visual representations corresponding to early, middle, and late visual processing stages were based on a deep neural network.  ...  Three levels of semantic representations were based on normative Observed ("is round"), Taxonomic ("is a fruit"), and Encyclopedic features ("is sweet").  ...  In fact, this region is assumed to bind multimodal visuo-semantic representations (Yazar Y et al. 2014; Tibon R et al. 2019) and to play a role in both visual and semantic tasks (Binder JR et al. 2009  ... 
doi:10.1101/2020.02.11.944801 fatcat:zpihku26brduzmrg7ckpadi3my

Signal Processing and Machine Learning for Mental Health Research and Clinical Applications [Perspectives]

Daniel Bone, Chi-Chun Lee, Theodora Chaspari, James Gibson, Shrikanth Narayanan
2017 IEEE Signal Processing Magazine  
One viable approach is to use transfer learning, wherein similar learning tasks are used to bias models to learn more quickly how to perform related tasks.  ...  Consider image processing: the world's leading experts have only recently been able to robustly identify animals within a photograph; but transfer learning of those models are currently being applied to  ... 
doi:10.1109/msp.2017.2718581 fatcat:hd7hqslz6zhndasv46dfhlgtbm

From Recognition to Prediction: Analysis of Human Action and Trajectory Prediction in Video [article]

Junwei Liang
2021 arXiv   pre-print
With the advancement in computer vision deep learning, systems now are able to analyze an unprecedented amount of rich visual information from videos to enable applications such as autonomous driving,  ...  To enable optimal future human behavioral forecasting, it is crucial for the system to be able to detect and analyze human activities as well as scene semantics, passing informative features to the subsequent  ...  However, the stacked capsule autoencoder in [107] is designed for 2D images, and too expensive for large scale visual recognition.  ... 
arXiv:2011.10670v3 fatcat:mlom5zqk6jdvjndcsfwimpj7xu

Word meaning in minds and machines [article]

Brenden M. Lake, Gregory L. Murphy
2021 arXiv   pre-print
Current models are too strongly linked to the text-based patterns in large corpora, and too weakly linked to the desires, goals, and beliefs that people express through words.  ...  Machines have achieved a broad and growing set of linguistic competencies, thanks to recent progress in Natural Language Processing (NLP).  ...  partially funded by NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science, and DARPA Award A000011479; PO: P000085618 for the Machine Common Sense program, both to  ... 
arXiv:2008.01766v3 fatcat:vi4zp7ebxfcepesrz5b2vkxdcu

Embedding Comparator: Visualizing Differences in Global Structure and Local Neighborhoods via Small Multiples [article]

Angie Boggust, Brandon Carter, Arvind Satyanarayan
2022 arXiv   pre-print
Embeddings mapping high-dimensional discrete input to lower-dimensional continuous vector spaces have been widely adopted in machine learning applications as a way to capture domain semantics.  ...  In evaluations with 15 participants, we find our system accelerates comparisons by shifting from laborious manual specification to browsing and manipulating visualizations.  ...  For example, "7" has become more closely related to positive adjectives as a result of numeric scales used to rank movies within the reviews.  ... 
arXiv:1912.04853v3 fatcat:zfool6r745hlpef4nu7tl3gxi4
« Previous Showing results 1 — 15 out of 625 results