Multimedia content processing through cross-modal association

Dongge Li, Nevenka Dimitrova, Mingkun Li, Ishwar K. Sethi
2003 Proceedings of the eleventh ACM international conference on Multimedia - MULTIMEDIA '03  
Multimodal information processing has received considerable attention in recent years. The focus of existing research in this area has been predominantly on the use of fusion technology. In this paper, we suggest that cross-modal association can provide a new set of powerful solutions in this area. We investigate different crossmodal association methods using the linear correlation model. We also introduce a novel method for cross-modal association called Cross-modal Factor Analysis (CFA). Our
more » ... arlier work on Latent Semantic Indexing (LSI) is extended for applications that use offline supervised training. As a promising research direction and practical application of cross-modal association, cross-modal information retrieval where queries from one modality are used to search for content in another modality using low-level features is then discussed in detail. Different association methods are tested and compared using the proposed cross-modal retrieval system. All these methods achieve significant dimensionality reduction. Among them CFA gives the best retrieval performance. Finally, this paper addresses the use of cross-modal association to detect talking heads. The CFA method achieves 91.1% detection accuracy, while LSI and Canonical Correlation Analysis (CCA) achieve 66.1% and 73.9% accuracy, respectively. As shown by experiments, crossmodal association provides many useful benefits, such as robust noise resistance and effective feature selection. Compared to CCA and LSI, the proposed CFA shows several advantages in analysis performance and feature usage. Its capability in feature selection and noise resistance also makes CFA a promising tool for many multimedia analysis applications.
doi:10.1145/957013.957143 dblp:conf/mm/LiDLS03 fatcat:rs5i65fdmfhkbnxx2ibm5wsbu4