Semantic representation of multimedia content: Knowledge representation and semantic indexing

Phivos Mylonas, Thanos Athanasiadis, Manolis Wallace, Yannis Avrithis, Stefanos Kollias
2007 Multimedia tools and applications  
In this paper we present a framework for unified, personalized access to heterogeneous multimedia content in distributed repositories. Focusing on semantic analysis of multimedia documents, metadata, user queries and user profiles, it contributes to the bridging of the gap between the semantic nature of user queries and raw multimedia documents. The proposed approach utilizes as input visual content analysis results, as well as analyzes and exploits associated textual annotation, in order to
more » ... ract the underlying semantics, construct a semantic index and classify documents to topics, based on a unified knowledge and semantics representation model. It may then accept user queries, and, carrying out semantic interpretation and expansion, retrieve documents from the index and rank them according to user preferences, similarly to text retrieval. All processes are based on a novel semantic processing methodology, employing fuzzy algebra and principles of taxonomic knowledge representation. The first part of this work presented in this paper deals with data and knowledge models, manipulation of multimedia content annotations and semantic indexing, while the second part will continue on the use of the extracted semantic information for personalized retrieval. Introduction Over the last decade, multimedia content indexing and retrieval has been influenced by the important progress in numerous fields, such as digital content production, archiving, multimedia signal processing and analysis, computer vision, artificial intelligence and information retrieval [9]. One major obstacle, though, multimedia retrieval systems still need to overcome in order to gain widespread acceptance, is the semantic gap [50, 70, 92] ; the latter forms an existing problem and in this approach we provide a partial contribution towards its solution. This refers to the extraction of the semantic content of multimedia documents, the interpretation of user information needs and requests, as well as to the matching between the two. This hindrance becomes even harder when attempting to access vast amounts of multimedia information encoded, represented and described in different formats and levels of detail. Although this gap has been acknowledged for a long time, multimedia analysis approaches are still divided into two main categories; the low-level multimedia analysis methods and tools on the one hand (e.g. [51, 58, 59, 62]) and the high-level semantic annotation methods and tools on the other hand (e.g. [11, 39, 80, 83] ). It was only recently, that state-of-the-art multimedia analysis systems have started using semantic knowledge technologies, as the latter are defined by notions like the Semantic Web [17, 88] and ontologies [34, 76] . The advantages of using Semantic Web technologies for the creation, manipulation and post-processing of multimedia metadata is depicted in numerous activities [77] , trying to provide "semantics to semantics". Digital video is the most demanding and complex data structure, due to its large amounts of spatiotemporal interrelations; video understanding and indexing is a key step towards more efficient manipulation of visual media, presuming semantic information extraction. As it is extensively shown in the literature [29, 45, 79] , it is true that multimedia standards, such as MPEG-7 [67] and 53], seek to consolidate and render effectively the infrastructure for the delivery and management of multimedia content and do provide important functionalities when dealing with aspects like the description of objects and associated metadata [71] . For instance, the Multimedia Description Scheme tools [14] specified by the MPEG-7 standard for describing multimedia content, include, among others, tools that represent the structure and semantics of multimedia data [10, 12, 15] . However, the important process of extraction of semantic descriptions from the content with the corresponding metadata, lies out of the scope of this standard, motivating heavy research efforts in the direction of automatic annotation of multimedia content [6, 13, 22, 93] . The need for machine-understandable representation and manipulation of the semantics associated with the MPEG-7 Descriptor Schemes and Descriptors, led to the development of ontologies for specific parts of 43, 44, 81] . In the approach proposed by Hunter [44] , trials and tribulations of building such an ontology are presented, as well as its exploitation and reusability by other communities on the Semantic Web. In [81] on the other hand, the semantic part of MPEG-7 is translated into an ontology that serves as the core one for the attachment of domain specific ontologies, in order to achieve MPEG-7 compliant domain specific annotations, hence the initial conceptualization of the domain
doi:10.1007/s11042-007-0161-4 fatcat:6b7dmsan7jee3j24lbvlrajasu