BilVideo: Design and Implementation of a Video Database Management System

Mehmet Emin Dönderler, Ediz Şaykol, Umut Arslan, Özgür Ulusoy, Uğur Güdükbay
2005 Multimedia tools and applications  
With the advances in information technology, the amount of multimedia data captured, produced, and stored is increasing rapidly. As a consequence, multimedia content is widely used for many applications in today's world, and hence, a need for organizing this data, and accessing it from repositories with vast amount of information has been a driving stimulus both commercially and academically. In compliance with this inevitable trend, first image and especially later video database management
more » ... tems have attracted a great deal of attention, since traditional database systems are designed to deal with alphanumeric information only, thereby not being suitable for multimedia data. In this paper, a prototype video database management system, which we call BilVideo, is introduced. The system architecture of BilVideo is original in that it provides full support for spatio-temporal queries that contain any combination of spatial, temporal, object-appearance, external-predicate, trajectory-projection, and similaritybased object-trajectory conditions by a rule-based system built on a knowledge-base, while utilizing an objectrelational database to respond to semantic (keyword, event/activity, and category-based), color, shape, and texture queries. The parts of BilVideo (Fact-Extractor, Video-Annotator, its Web-based visual query interface, and its SQL-like textual query language) are presented, as well. Moreover, our query processing strategy is also briefly explained. In this paper, BilVideo, a prototype video database management system, is introduced. The architecture of BilVideo is original in that it provides full support for spatio-temporal queries that contain any combination of spatial, temporal, object-appearance, externalpredicate, trajectory-projection, and similarity-based object-trajectory conditions by a rulebased system built on a knowledge-base, while utilizing an object-relational database to respond to semantic (keyword, event/activity, and category-based), color, shape, and texture queries. The knowledge-base of BilVideo contains a fact-base and a comprehensive set of rules implemented in Prolog. The rules in the knowledge-base significantly reduce the number of facts that need to be stored for spatio-temporal querying of video data [11] . Moreover, the system's response time for different types of spatio-temporal queries is at interactive rates. Query processor interacts with both the knowledge-base and objectrelational database to respond to user queries that contain a combination of spatio-temporal, semantic, color, shape, and texture video queries. Intermediate query results returned from these two system components are integrated seamlessly by the query processor, and final results are sent to Web clients. BilVideo has a simple, yet very powerful SQL-like textual query language for spatio-temporal queries on video data [10] . For novice users, a visual query interface is provided. Both the query language and the visual query interface are currently being extended to support semantic, color, shape, and texture queries. To the best of our knowledge, BilVideo is by far the most feature-complete video DBMS, as it supports spatio-temporal, semantic, color, shape, and texture queries in an integrated manner. Moreover, it is also unique in its support for retrieving any segment of a video clip, where the given query conditions are satisfied, regardless of how video data is semantically partitioned. To our knowledge, none of the video query systems available today can return a subinterval of a scene as part of a query result, simply because video features are associated with scenes defined to be the smallest semantic units of video data. In our approach, object trajectories, object-appearance relations, and spatio-temporal relations between video objects are represented as Prolog facts in a knowledge-base, and they are not explicitly related to semantic units of videos. Thus, BilVideo can return precise answers for user queries, when requested, in terms of frame intervals. Moreover, our assessment for the directional relations between two video objects is also novel in that two overlapping objects may have directional relations defined for them with respect to one another, provided that center points of the objects' Minimum Bounding Rectangles (MBRs) are different. It is because Allen's temporal interval algebra, [2], is not used as a basis for the directional relation definition in our approach: in order to determine which directional relation holds between two objects, center points of the objects' MBRs are used [11] . Furthermore, BilVideo query language provides three aggregate functions, average, sum, and count, which may be very attractive for some applications, such as sports analysis systems and mobile object tracking systems, to collect statistical data on spatio-temporal events. The rest of the paper is organized as follows: A review of the research in the literature that is closely related to our work is given, in comparison to our work, in Section 2. Overall architecture of BilVideo, along with its knowledge-base structure, is briefly explained in Section 3. Section 4 presents the Fact-Extractor tool developed to populate the knowledgebase of the system with facts for spatio-temporal querying of video data. The tool also extracts color and shape histograms of objects, and stores them in the feature database for
doi:10.1007/s11042-005-2715-7 fatcat:nmcyutwfindu7cv6ktzzeip2by