7 Hits in 6.9 sec


Kan Chen, Rama Kovvuri, Jiyang Gao, Ram Nevatia
2017 Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval - ICMR '17  
We propose a novel Multimodal Spatial Regression with semantic Context (MSRC) system which not only predicts the location of ground truth based on proposal bounding boxes, but also re nes prediction results  ...  Second, MSRC not only encodes the semantics of a query phrase, but also deals with its relation with other queries in the same sentence (i.e., context) by a context re nement network.  ...  CONCLUSION We proposed a novel Multimodal Spatial Regression with semantic Context (MSRC) system, which focuses on phrase grounding problem.  ... 
doi:10.1145/3078971.3078976 dblp:conf/mir/ChenKGN17 fatcat:sv6p5i2lbzh6pcjv4okdcncpeq

Editorial for the ICMR 2017 special issue

Michael S. Lew
2018 International Journal of Multimedia Information Retrieval  
The paper "MSRC: Multimodal Spatial Regression with Semantic Context for Phrase Grounding" by Kan Chen, Rama Kovvuri, Jiyang Gao and Ram Nevatia makes significant progress toward answering text queries  ...  The authors propose a system which applies a spatial regression network (SRN) to predict object locations and a context refinement network (CRN) which encodes context information and uses a novel joint  ... 
doi:10.1007/s13735-018-0148-0 fatcat:nbgvh2lgmfhixm6zcl4ig455ja

PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding [article]

Rama Kovvuri, Ram Nevatia
2018 arXiv   pre-print
In this paper, we present a framework that leverages information such as phrase category, relationships among neighboring phrases in a sentence and context to improve the performance of phrase grounding  ...  Also, in the absence of ground-truth spatial locations of the phrases(weakly-supervised), we propose knowledge transfer mechanisms that leverages the framework of PIN module.  ...  The existing approaches for grounding do not take full advantage of the rich context provided by visual context, semantic context and inter-phrase relationships.  ... 
arXiv:1812.03213v1 fatcat:747uu5og4rgwvakcbzcx7z3ud4

Learning to Reason from General Concepts to Fine-grained Tokens for Discriminative Phrase Detection [article]

Maan Qraitem, Bryan A. Plummer
2021 arXiv   pre-print
Second, for phrases containing fine grained mutually-exclusive tokens (eg colors), we force the model into selecting only one applicable phrase for each region.  ...  Phrase detection requires methods to identify if a phrase is relevant to an image and then localize it if applicable.  ...  In Proceedings of the 2014 Conference on Empirical timodal spatial regression with semantic context for phrase Methods in Natural Language Processing (EMNLP), pages grounding.  ... 
arXiv:2112.03237v1 fatcat:7p4doztfn5dyhdrcstlxexux4a

Learning Deep and Wide: A Spectral Method for Learning Deep Networks

Ling Shao, Di Wu, Xuelong Li
2014 IEEE Transactions on Neural Networks and Learning Systems  
The first ranked method uses early fusion for multimodal data.  ...  In particular, we demonstrate that multimodal feature learning will extract semantically meaningful shared representations, outperforming individual modalities, and the early fusion scheme's efficacy against  ...  The training set is roughly of 400,000 frames and is divided into 33 minibatches with first 30 batches for training and the rest 3 batches for validation.  ... 
doi:10.1109/tnnls.2014.2308519 pmid:25420251 fatcat:4mnl6tv2xnf3jpzwhp76cvl4ti

A percentual learning model to discover the hierarchical latent structure of image collections

Davide Bacciu
The first part of the thesis introduces a novel computational model with repetition suppression, which forms an unsupervised competitive systemtermed CoRe, for Competitive Repetition-suppression learning  ...  Particular emphasis is placed on validating the model as an effective tool for the unsupervised exploration of bio-medical data.  ...  Figure 67 : 67 Example of retrieved segments for the MSRC classes: cows, faces, grass and sky. Figure 68 : 68 Example of retrieved segments for the tree MSRC class.  ... 
doi:10.6092/imtlucca/e-theses/7 fatcat:evv3d4ol7fcdhjgqma743gfn3y

Multimodal Emotion Recognition Based Human-Robot Interaction Enhancement

Fatemeh Noroozi
2020 unpublished
For instance, in [286, 208, 317] , convolutional DNNs with seven layers were utilized for detecting the human body, as well as regression and representation of joint contexts. high-level global features  ...  Moreover, they suggested employing a part-based spatial model, together with Convolutional Network (ConvNet).  ... 
doi:10.13140/rg.2.2.17543.96164 fatcat:sqfxpplvgndwjhgkci7f5uexny