A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
MSRC
2017
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval - ICMR '17
We propose a novel Multimodal Spatial Regression with semantic Context (MSRC) system which not only predicts the location of ground truth based on proposal bounding boxes, but also re nes prediction results ...
Second, MSRC not only encodes the semantics of a query phrase, but also deals with its relation with other queries in the same sentence (i.e., context) by a context re nement network. ...
CONCLUSION We proposed a novel Multimodal Spatial Regression with semantic Context (MSRC) system, which focuses on phrase grounding problem. ...
doi:10.1145/3078971.3078976
dblp:conf/mir/ChenKGN17
fatcat:sv6p5i2lbzh6pcjv4okdcncpeq
Editorial for the ICMR 2017 special issue
2018
International Journal of Multimedia Information Retrieval
The paper "MSRC: Multimodal Spatial Regression with Semantic Context for Phrase Grounding" by Kan Chen, Rama Kovvuri, Jiyang Gao and Ram Nevatia makes significant progress toward answering text queries ...
The authors propose a system which applies a spatial regression network (SRN) to predict object locations and a context refinement network (CRN) which encodes context information and uses a novel joint ...
doi:10.1007/s13735-018-0148-0
fatcat:nbgvh2lgmfhixm6zcl4ig455ja
PIRC Net : Using Proposal Indexing, Relationships and Context for Phrase Grounding
[article]
2018
arXiv
pre-print
In this paper, we present a framework that leverages information such as phrase category, relationships among neighboring phrases in a sentence and context to improve the performance of phrase grounding ...
Also, in the absence of ground-truth spatial locations of the phrases(weakly-supervised), we propose knowledge transfer mechanisms that leverages the framework of PIN module. ...
The existing approaches for grounding do not take full advantage of the rich context provided by visual context, semantic context and inter-phrase relationships. ...
arXiv:1812.03213v1
fatcat:747uu5og4rgwvakcbzcx7z3ud4
Learning to Reason from General Concepts to Fine-grained Tokens for Discriminative Phrase Detection
[article]
2021
arXiv
pre-print
Second, for phrases containing fine grained mutually-exclusive tokens (eg colors), we force the model into selecting only one applicable phrase for each region. ...
Phrase detection requires methods to identify if a phrase is relevant to an image and then localize it if applicable. ...
In Proceedings of the 2014 Conference on Empirical
timodal spatial regression with semantic context for phrase Methods in Natural Language Processing (EMNLP), pages
grounding. ...
arXiv:2112.03237v1
fatcat:7p4doztfn5dyhdrcstlxexux4a
Learning Deep and Wide: A Spectral Method for Learning Deep Networks
2014
IEEE Transactions on Neural Networks and Learning Systems
The first ranked method uses early fusion for multimodal data. ...
In particular, we demonstrate that multimodal feature learning will extract semantically meaningful shared representations, outperforming individual modalities, and the early fusion scheme's efficacy against ...
The training set is roughly of 400,000 frames and is divided into 33 minibatches with first 30 batches for training and the rest 3 batches for validation. ...
doi:10.1109/tnnls.2014.2308519
pmid:25420251
fatcat:4mnl6tv2xnf3jpzwhp76cvl4ti
A percentual learning model to discover the hierarchical latent structure of image collections
2008
The first part of the thesis introduces a novel computational model with repetition suppression, which forms an unsupervised competitive systemtermed CoRe, for Competitive Repetition-suppression learning ...
Particular emphasis is placed on validating the model as an effective tool for the unsupervised exploration of bio-medical data. ...
Figure 67 : 67 Example of retrieved segments for the MSRC classes: cows, faces, grass and sky.
Figure 68 : 68 Example of retrieved segments for the tree MSRC class. ...
doi:10.6092/imtlucca/e-theses/7
fatcat:evv3d4ol7fcdhjgqma743gfn3y
Multimodal Emotion Recognition Based Human-Robot Interaction Enhancement
2020
unpublished
For instance, in [286, 208, 317] , convolutional DNNs with seven layers were utilized for detecting the human body, as well as regression and representation of joint contexts. high-level global features ...
Moreover, they suggested employing a part-based spatial model, together with Convolutional Network (ConvNet). ...
doi:10.13140/rg.2.2.17543.96164
fatcat:sqfxpplvgndwjhgkci7f5uexny