Multimedia content processing through cross-modal association

Dongge Li, Nevenka Dimitrova, Mingkun Li, Ishwar K. Sethi
2003 Proceedings of the eleventh ACM international conference on Multimedia - MULTIMEDIA '03  
The demand for technology that can process cross-modality information is becoming more and more obvious and urgent. Existing research in multimodal information analysis has been predominantly focusing on the use of fusion technology. However, we demonstrate in this paper that cross-modal association could also provide a set of powerful approaches for multimedia information processing. We investigate different cross-modal association methods using the linear correlation model. We also introduce
more » ... novel method for cross-modal association called Cross-modal Factor Analysis (CFA). Our earlier work on Latent Semantic Indexing (LSI) is extended for applications that need global training or off-line training. As a promising research direction and practical application of crossmodal association, cross-modal retrieval is then discussed in detail. Different association methods are tested and compared using the proposed cross-modal retrieval system. All these methods provide effective dimensionality reduction. Among them CFA gives the best retrieval performance. Finally, this paper addresses the use of cross-modal association to detect talking heads. The CFA method achieves 91.1% detection accuracy, while LSI and Canonical Correlation Analysis (CCA) achieve 66.1% and 73.9% accuracy, respectively. As shown by experiments in cross-modal retrieval and talking head analysis, CFA provides a powerful tool to analyze semantic associations between different modalities. Compared to CCA, CFA provides better noise resistance capabilities and has no constraints on the features to be processed. Its great feature selection capability also makes CFA a promising tool for many multimedia analysis applications.
doi:10.1145/957142.957143 fatcat:24726hk6cjdhloy5vdp5m52oki