Searching 100M Images by Content Similarity
Italian Research Conference on Digital Library Management Systems
Digital Library Systems (DLS) are software applications implementing the functionalities to operate over the (possibly compound) objects of a Digital Library. In the past, the development of DLSs has been mainly characterised by a from-scratch approach. Only in the recent period Digital Library System developers started adopting Digital Library Management Systems (DLMS), special software systems easing the development process by supporting facilities common to DLSs. The Digital Library
... has not yet reached a formal agreement on the detailed functionality these systems must implement in terms of content management support. This work focuses on the foundational aspects of content management for DLMSs by discussing a data model whose novelty is that of (i) identifying modeling primitives capable of expressing the nature of the compound objects of any DLS and (ii) re-introducing the notion of object type as a mean of supporting safe, optimized and efficient DLS implementations. The model will inspire the realization of the first DLMS supporting the expressiveness of compound objects and the traditional capabilities of static typing. M. Agosti -F. Esposito -C. Thanos (Eds) Post-proceedings of IRCDL 2009 1 (i) Including the data abstractions necessary for describing any digital library information object model in terms of compound objects; (ii) Inspiring the design of DLMSs supporting safe, efficient and optimized storage, access and search of compound objects. In the following we motivate the foundations of a typed compound object data model by means of real case scenario requirements, then conclude by illustrating our model proposal. Model Requirements As explained in the previous section, flexible DLMSs support compound object data models devised to meet the requirements of developers willing to realize DLSs. A desirable feature of such models is that of being both "fully expressive" and "minimal", that is (i) the set of primitives they provide should be capable of describing any information object model and (ii) removing one of such primitives would compromise the expressivity of the data model language, i.e. leave a subset of DLS information object models out of our solution domain. To identify such sets of primitives, a study of common DL behavioral patterns is necessary. Consider the following DLS real-case scenarios. Real-case 1 (Catalogues). DLSs for management of metadata record catalogues, for example in standard library administration. In this case the records are describing entities, i.e. publications, whose digital payload is not stored within the DLS. The metadata records may obey to a standard bibliographic metadata format, such as Dublin Core or MARC, or to a proprietary format of preference to the DLS user community. The DLS offers efficient search functionality over the metadata records, based on the given format. Real-case 2 (Archives). DLSs for management of multi-media digital objects coming with their metadata description. In this case, the digital objects are stored in the DLS back-end, can be searched through their metadata, and eventually accessed by proper protocols; e.g. streaming for video digital objects. In principle, the same digital object may be described by several metadata records, conforming to metadata formats specific to relative application scenarios. The DLS offers efficient format-based search functionality over the metadata records. Note that the same scenario is reflected by Institutional Repositories, with publications and bibliographic metadata. Real-case 3 (Enhanced Publication Management). A special DLS for management of enhanced publication objects, intended as graphs of digital objects consisting of one publication object, with zero or one Dublin Core metadata record description and with relations to other publication objects, cited by the publication. The DLS is capable of (i) ingesting or importing publication objects, i.e. metadata records and/or payloads, from other domains in the form of "simple" compound objects (digital objects with no relationships) and (ii) allowing the user community to construct enhanced publications by specifying reference relationships over such pool of simple compound objects. It is important to observe that: the same simple objects can be part of several enhanced publications; relationships are specified in a second stage and are therefore kept apart from the objects.