Filters








5,741 Hits in 5.7 sec

Semantic Relationships in Multi-modal Graphs for Automatic Image Annotation [chapter]

Vassilios Stathopoulos, Jana Urban, Joemon Jose
Lecture Notes in Computer Science  
It is important to integrate contextual information in order to improve the inaccurate results of current approaches for automatic image annotation.  ...  Graph based representations allow incorporation of such information. However, their behaviour has not been studied in this context.  ...  Graphs and graph learning algorithms provide an interesting alternative for the problem of inference using multi-modal representations of documents.  ... 
doi:10.1007/978-3-540-78646-7_47 dblp:conf/ecir/StathopoulosUJ08 fatcat:dalixrzdzrf6dip6rwybhvlmzm

3D Scene Graph: A Structure for Unified Semantics, 3D Space, and Camera [article]

Iro Armeni, Zhi-Yang He, JunYoung Gwak, Amir R. Zamir, Martin Fischer, Jitendra Malik, Silvio Savarese
2019 arXiv   pre-print
Aspiring to have one unified structure that hosts diverse types of semantics, we follow the Scene Graph paradigm in 3D, generating a 3D Scene Graph.  ...  A comprehensive semantic understanding of a scene is important for many applications - but in what space should diverse semantic information (e.g., objects, scene categories, material types, texture, etc  ...  Conclusion We discussed the grounding of multi-modal 3D semantic information in a unified structure that establishes relationships between objects, 3D space, and camera.  ... 
arXiv:1910.02527v1 fatcat:ipibuqyurbgkvkumc75eaj3jvi

A Survey on Automatic Image Annotation and Trends of the New Age

Feichao Wang
2011 Procedia Engineering  
In this paper, different approaches of automatic annotation are reviewed:1) generative model based image annotation, 2) discriminative model based image annotation, 3) Graph model based image annotation  ...  Automatic image annotation could help to retrieval images in a large scale image database more rapidly and precisely.  ...  In paper [16] , Fan et al. proposed a hierarchical classification framework for bridging the semantic gap effectively and achieving multi-level image annotation automatically.  ... 
doi:10.1016/j.proeng.2011.11.2526 fatcat:ngghusjc3zefnl55e6bgdiju4a

Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval

Zhihao Fan, Zhongyu Wei, Zejun Li, Siyuan Wang, Haijun Shan, Xuanjing Huang, Jianqing Fan
2022 Proceedings of the 2022 International Conference on Multimedia Retrieval  
In practice, multi-grained semantic labels are automatically constructed for a query image in both sentence-level and phraselevel.  ...  In order to integrate both supervision of sentence-level and phrase-level, we propose Semantic Structure Aware Multimodal Transformer (SSAMT) for multi-modal representation learning.  ...  We concatenate the sentence and its phrases in language side, while image and its regions in vision side, then present mask transformer for jointly cross-modality modeling with multi-grained semantics.  ... 
doi:10.1145/3512527.3531368 fatcat:umzmktgwazcbhjl4e2zbjj6jum

Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval [article]

Zhihao Fan, Zhongyu Wei, Zejun Li, Siyuan Wang, Haijun Shan, Xuanjing Huang, Jianqing Fan
2021 arXiv   pre-print
In practice, multi-grained semantic labels are automatically constructed for a query image in both sentence-level and phrase-level.  ...  In order to integrate both supervision of sentence-level and phrase-level, we propose Semantic Structure Aware Multimodal Transformer (SSAMT) for multi-modal representation learning.  ...  The multi-grained semantic labels construction is to automatically collect semantic labels from annotated sentences of the query image, the cross-modality representation learning with multigrained semantics  ... 
arXiv:2109.05523v1 fatcat:w6lov6xpejhwbc46iycjj5wy2e

Understanding Art through Multi-Modal Retrieval in Paintings [article]

Noa Garcia, Benjamin Renoust, Yuta Nakashima
2019 arXiv   pre-print
We introduce the use of multi-modal techniques in the field of automatic art analysis by 1) collecting a multi-modal dataset with fine-art paintings and comments, and 2) exploring robust visual and textual  ...  representations in artistic images.  ...  We first introduce a multi-modal dataset for visual arts, in which each image of a painting is associated with an artistic comment ( Figure 1 ).  ... 
arXiv:1904.10615v1 fatcat:7szy5ph7tbb43nxby7ocizgtwe

Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships [article]

Chao Lou, Wenjuan Han, Yuhuan Lin, Zilong Zheng
2022 arXiv   pre-print
Moreover, we benchmark our dataset by proposing a contrastive learning (CL)-based framework VLGAE, short for Vision-Language Graph Autoencoder.  ...  Previous works have shown compelling comprehensive results by building hierarchical structures for visual scenes (e.g., scene graphs) and natural languages (e.g., dependency trees), individually.  ...  In such a heterogeneous graph, semantically consistent instances across two graphs (DT and SG) are aligned in different levels, which maximizes the retention of the representation from two modalities.  ... 
arXiv:2203.14260v3 fatcat:65xsxe3cdzefdf3uadnj3d27dm

Multi-Modal Knowledge Representation Learning via Webly-Supervised Relationships Mining

Fudong Nian, Bing-Kun Bao, Teng Li, Changsheng Xu
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
The more and more rich available multi-modal data on Internet also drive us to explore a novel approach for KRL in multi-modal way, and overcome the limitations of previous single-modal based methods.  ...  We build a large-scale multimodal relationship dataset (MMR-D) and the experimental results show that our framework achieves excellent performance in zeroshot multi-modal retrieval and visual relationship  ...  of multi-modal knowledge graph.  ... 
doi:10.1145/3123266.3123443 dblp:conf/mm/NianBLX17 fatcat:e5wyg4iykzgexcb6vohf2okshm

SVGraph: Learning Semantic Graphs from Instructional Videos [article]

Madeline C. Schiappa, Yogesh S. Rawat
2022 arXiv   pre-print
We attempt to overcome "black box" learning limitations by presenting Semantic Video Graph or SVGraph, a multi-modal approach that utilizes narrations for semantic interpretability of the learned graphs  ...  We perform experiments on multiple datasets and demonstrate the interpretability of SVGraph in semantic graph learning.  ...  Semantic Attention In order to make our graph interpretable without annotations, we learn semantically relevant features for our nodes.  ... 
arXiv:2207.08001v1 fatcat:546obyqea5d4zbc7xrdygskfde

Semantic-Based Video Retrieval Survey

Shaimaa Toriah Mohamed Toriah, Atef Zaki Ghalwash, Aliaa A. A. Youssif
2018 Journal of Computer and Communications  
Moreover, the different methods that bridge the semantic gap in video retrieval are discussed in more details.  ...  Digital data include image, text, and video. Video represents a rich source of information. Thus, there is an urgent need to retrieve, organize, and automate videos.  ...  It proved that the factor graph can handle the stochastic relationships between features extracted from the multi-modality.  ... 
doi:10.4236/jcc.2018.68003 fatcat:qfep2py7ufhwxea7vazpujltja

A Knowledge-based Image Retrieval System Integrating Semantic and Visual Features

Olfa Allani, Hajer Baazaoui Zghal, Nedra Mellouli, Herman Akdag
2016 Procedia Computer Science  
The idea is to automatically build a modular ontology for semantic information and organize visual features in a graph-based model.  ...  In this paper, we propose an image retrieval system integrating semantic and visual features.  ...  Multi-modality ontologies include semantic concepts and relations as well as visual classes of images resulting from image classification and relationships extracted from low-level features 10 .  ... 
doi:10.1016/j.procs.2016.08.188 fatcat:aicfbk4yubgjhbcat46sbdxykm

Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification [article]

Renchun You, Zhiyao Guo, Lei Cui, Xiang Long, Yingze Bao, Shilei Wen
2020 arXiv   pre-print
In order to overcome these challenges, we propose to use cross-modality attention with semantic graph embedding for multi label classification.  ...  Multi-label image and video classification are fundamental yet challenging tasks in computer vision.  ...  We thank all anonymous reviewers for their constructive comments.  ... 
arXiv:1912.07872v2 fatcat:rmorsx4zufdftelastk5wxpfky

Layout Aware Semantic Element Extraction for Sustainable Science & Technology Decision Support

Hyuntae Kim, Jongyun Choi, Soyoung Park, Yuchul Jung
2022 Sustainability  
For now, we succeeded in extracting about 6 million semantic elements from 49,649 PDFs.  ...  Therefore, this paper proposes LA-SEE (LAME and Vi-SEE), a knowledge graph construction framework that simultaneously extracts meta-information and useful image objects from S&T documents in various layout  ...  However, due to the computer-annotated data quality, the experiments on the DocBank are just performed with image features rather than multi-modal.  ... 
doi:10.3390/su14052802 fatcat:eew4bb5q55ccpavk6yxroogsgq

Hierarchical classification for automatic image annotation

Jianping Fan, Yuli Gao, Hangzai Luo
2007 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07  
In this paper, a hierarchical classification framework has been proposed for bridging the semantic gap effectively and achieving multi-level image annotation automatically.  ...  hierarchical image classifier training with automatic error recovery.  ...  CONCLUSIONS In this paper, we have proposed a novel algorithm for automatic multi-level image annotation via hierarchical classification.  ... 
doi:10.1145/1277741.1277763 dblp:conf/sigir/FanGL07 fatcat:glefia7ybzb7fh5ormjohd4zkq

Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification

Renchun You, Zhiyao Guo, Lei Cui, Xiang Long, Yingze Bao, Shilei Wen
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
In order to overcome these challenges, we propose to use cross-modality attention with semantic graph embedding for multi-label classification.  ...  Multi-label image and video classification are fundamental yet challenging tasks in computer vision.  ...  We thank all anonymous reviewers for their constructive comments.  ... 
doi:10.1609/aaai.v34i07.6964 fatcat:nte7xum7ozdbpgdrribz6r2rd4
« Previous Showing results 1 — 15 out of 5,741 results