1,022 Hits in 4.7 sec

Automatic image semantic interpretation using social action and tagging data

Neela Sawant, Jia Li, James Z. Wang
2010 Multimedia tools and applications  
Our study builds on an interdisciplinary confluence of insights from image processing, data mining, human computer interaction, and sociology to describe the folksonomic features of users, annotations  ...  and images.  ...  Model-based techniques are conceptually similar to the traditional concept learning techniques, except that the input training data is mined from labels in social media and collaborative games.  ... 
doi:10.1007/s11042-010-0650-8 fatcat:kqu6kyess5f3re554jsueuuzem

IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System

Liangliang Cao, Shih-Fu Chang, Noel Codella, Courtenay V. Cotton, Dan Ellis, Leiguang Gong, Matthew L. Hill, Gang Hua, John R. Kender, Michele Merler, Yadong Mu, Apostol Natsev (+1 others)
2011 TREC Video Retrieval Evaluation  
Run 2 was interesting in assessing performance based on different kinds of high-level semantic information.  ...  Run 3 fused the lowand high-level feature information and was interesting in providing insight into the complementarity of this information for detecting events.  ...  The final 10 concepts were based on an in-house labeling performed as part of MED2010 in which 6626 10-second segments cut from the MED2010 development data were annotated with 10 audio-related labels  ... 
dblp:conf/trecvid/CaoCCCEGH0KMMNS11 fatcat:gg5hdmdh65bwlcksf4q467lt7m

Fusing Appearance and Spatio-Temporal Models for Person Re-Identification and Tracking

Andrew Tzer-Yeu Chen, Morteza Biglari-Abhari, Kevin I-Kai Wang
2020 Journal of Imaging  
(spatio-temporal-based tracking) and assigning identity labels based on tracks formed.  ...  This paper presents a model fusion approach, aiming towards combining both sources of information together in order to increase the accuracy of determining identity classes for detected people using re-ranking  ...  , and ceiling/floor weights for the β weight values in the model fusion module.  ... 
doi:10.3390/jimaging6050027 pmid:34460729 fatcat:kezgbojlubhmdmj7kroa7hpada

Informedia@TRECVID 2011: Surveillance Event Detection

Lei Bao, Longfei Zhang, Shoou-I Yu, Zhen-zhong Lan, Lu Jiang, Arnold Overwijk, Qin Jin, Shohei Takahashi, Brian Langner, Yuanpeng Li, Michael Garbus, Susanne Burger (+2 others)
2011 TREC Video Retrieval Evaluation  
Different sliding window sizes and steps were adopted for different events based on the event duration priors.  ...  This approach is based on local spatio-temporal descriptors, called MoSIFT, and generated from pair-wise video frames.  ...  Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.  ... 
dblp:conf/trecvid/BaoZYL0OJTLLGBM11 fatcat:a3eteiiit5cy7el5epiqvphqsi

Tweeting Cameras for Event Detection

Yuhui Wang, Mohan S. Kankanhalli
2015 Proceedings of the 24th International Conference on World Wide Web - WWW '15  
However, the different characteristics of these social and sensor data make such information fusion for event detection a challenging problem.  ...  These tweets are represented by a unified probabilistic spatio-temporal (PST) data structure which is then aggregated to a concept-based image (Cmage) as the common representation for visualization.  ...  We thank the Ambient Intelligence Laboratory, NUS for sharing the NUS foodcourts camera data and associated analysis. We also would like to thank our SeSaMe colleague Dr.  ... 
doi:10.1145/2736277.2741634 dblp:conf/www/WangK15 fatcat:c3i6mehycbcgjpkaiykcvnplau

Vehicle Re-Identification with Spatio-Temporal Model Leveraging by Pose View Embedding

Wenxin Huang, Xian Zhong, Xuemei Jia, Wenxuan Liu, Meng Feng, Zheng Wang, Shin'ichi Satoh
2022 Electronics  
Consequently, we design a two-branch framework for vehicle Re-ID, including a Keypoint-based Pose Embedding Visual (KPEV) model and a Keypoint-based Pose-Guided Spatio-Temporal (KPGST) model.  ...  These models are integrated into the framework, and the results of KPEV and KPGST are fused based on a Bayesian network.  ...  Informed Consent Statement: Not applicable. Data Availability Statement: Not applicable.  ... 
doi:10.3390/electronics11091354 fatcat:ewaqghwiivha7h6wdfottsvati


Shih-Fu Chang, William Chen, Horace J. Meng, Hari Sundaram, Di Zhong
1997 Proceedings of the fifth ACM international conference on Multimedia - MULTIMEDIA '97  
Content based visual queries have been primarily focussed on still image retrieval.  ...  In this paper, we propose a novel, real-time, interactive system on the Web, based on the visual paradigm, with spatio-temporal attributes playing a key role in video retrieval.  ...  Tracking Objects: Motion, Color and Edges Our algorithm for segmentation and tracking of image regions based on the fusion of color, edge and motion information in the video shot.  ... 
doi:10.1145/266180.266382 dblp:conf/mm/ChangCMSZ97 fatcat:54gdz3aetjfy3jsxl6als27joi

Prolonged and distributed processing of facial identity in the human brain [article]

Rico Stecher, Ilkka Muukkonen, Viljami R Salmela, Sophie-Marie Rostalski, Géza Gergely Ambrus, Gyula Kovács
2021 bioRxiv   pre-print
Despite extensive prior fMRI and EEG/MEG research on the neural representations of familiar faces, we know little about the spatio-temporal dynamics of face identity information.  ...  Therefore, we applied a novel multimodal approach, by fusioning the neuronal responses recorded in an fMRI and an EEG experiment.  ...  In addition, they would like to thank Daniel Kaiser and Martin Hebart for their advice regarding the analysis and comments on the manuscript.  ... 
doi:10.1101/2021.06.23.449599 fatcat:jdrjv6cvpvemvncpsilwk7eosa

High-level event recognition in unconstrained videos

Yu-Gang Jiang, Subhabrata Bhattacharya, Shih-Fu Chang, Mubarak Shah
2012 International Journal of Multimedia Information Retrieval  
However, due to the fast growing popularity of such videos, especially on the Web, solutions to this problem are in high demands and have attracted great interest from researchers.  ...  across different modalities, classification strategies, fusion techniques, etc.  ...  Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon.  ... 
doi:10.1007/s13735-012-0024-2 fatcat:mfzttic3svb4tho2xb6aczgp4y

Are You Watching Closely? Content-based Retrieval of Hand Gestures

Mahnaz Amiri Parian, Luca Rossetto, Heiko Schuldt, Stéphane Dupont
2020 Proceedings of the 2020 International Conference on Multimedia Retrieval  
Our proposed pipeline, I3DEF, is based on the extraction of spatio-temporal features from intermediate layers of an I3D network, a state-of-the-art network for action recognition, and the fusion of the  ...  In this paper, we explore the problem of identifying and retrieving gestures in a large-scale video dataset provided by the computer vision community and based on queries recorded in-the-wild.  ...  Content-based video retrieval is mainly based on feature extraction and similarity calculation.  ... 
doi:10.1145/3372278.3390723 dblp:conf/mir/ParianRSD20 fatcat:2jsdhqhg7jazbjfdpu54zfifq4

A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets

Khaled Bayoudh, Raja Knani, Fayçal Hamdaoui, Abdellatif Mtibaa
2021 The Visual Computer  
In particular, we summarize six perspectives from the current literature on deep multimodal learning, namely: multimodal data representation, multimodal fusion (i.e., both traditional and deep learning-based  ...  This involves the development of models capable of processing and analyzing the multimodal information uniformly.  ...  The emphasis on these modalities (RGB, depth, and flow data) is based on the fact that for many vision-based multimodal problems, it has been shown that the fusion of optical flow and depth information  ... 
doi:10.1007/s00371-021-02166-7 pmid:34131356 pmcid:PMC8192112 fatcat:jojwyc6slnevzk7eaiutlmlgfe

A review of EO image information mining [article]

Marco Quartulli, Igor G. Olaizola
2012 arXiv   pre-print
We analyze the state of the art of content-based retrieval in Earth observation image archives focusing on complete systems showing promise for operational implementation.  ...  The solutions envisaged for the issues related to feature simplification and synthesis, indexing, semantic labeling are reviewed. The methodologies for query specification and execution are analyzed.  ...  Fusion of panchromatic and multispectral information can be obtained by training several SOMs in parallel (one per feature).  ... 
arXiv:1203.0747v2 fatcat:nwiylcsdrnhthi753xcxwxgo7e

Extracting semantics from audio-visual content: the final frontier in multimedia retrieval

M.R. Naphade, T.S. Huang
2002 IEEE Transactions on Neural Networks  
His research interests include audio-visual signal processing and analysis for the purpose of multimedia understanding, content-based indexing, retrieval, and mining.  ...  We discuss how semantic retrieval is centered around concepts and context and also discuss various mechanisms for modeling concepts and context. a Research Staff Member.  ...  Frey for his valuable comments on factor graphs and S. F. Chang and D. Zhong of Columbia University for the blob tracking algorithm.  ... 
doi:10.1109/tnn.2002.1021881 pmid:18244476 fatcat:2joztr4jnbgedmsjvbzvqqe4su

Probability-based Dynamic Time Warping and Bag-of-Visual-and-Depth-Words for Human Gesture Recognition in RGB-D

Antonio Hernández-Vela, Miguel Ángel Bautista, Xavier Perez-Sala, Víctor Ponce-López, Sergio Escalera, Xavier Baró, Oriol Pujol, Cecilio Angulo
2014 Pattern Recognition Letters  
We present a methodology to address the problem of human gesture segmentation and recognition in video and depth image sequences.  ...  State-of-the-art RGB and depth features, including a newly proposed depth descriptor, are analysed and combined in a late fusion form.  ...  Acknowledgments This work has been partially supported by the ''Comissionat per a Universitats i Recerca del Departament d'Innovació, Universitats i Empresa de la Generalitat de Catalunya'' and the following  ... 
doi:10.1016/j.patrec.2013.09.009 fatcat:jtgkoj25kfhezagzfokgq4wyqq

Visual Features with Spatio-Temporal-Based Fusion Model for Cross-Dataset Vehicle Re-Identification

Zakria, Jianhua Deng, Jingye Cai, Muhammad Umar Aftab, Muhammad Saddam Khokhar, Rajesh Kumar
2020 Electronics  
vehicle images; whereas, spatio-temporal patterns of unlabelled target datasets are learned by transferring siamese neural network classifiers trained on a source-labelled dataset.  ...  We finally calculate the composite similarity score of spatio-temporal patterns with siamese neural-network-based classifier visual features.  ...  Acknowledgments: I am grateful to my worthy supervisor as well as all the lab mates for their endless support. Conflicts of Interest: There is no conflict between authors.  ... 
doi:10.3390/electronics9071083 fatcat:abkkg3vwqjcwjeimfjjuz4l67m
« Previous Showing results 1 — 15 out of 1,022 results