Web news categorization using a cross-media document graph

José Iria, Fabio Ciravegna, João Magalhães
2009 Proceeding of the ACM International Conference on Image and Video Retrieval - CIVR '09  
In this paper we propose a multimedia categorization framework that is able to exploit information across different parts of a multimedia document (e.g., a Web page, a PDF, a Microsoft Office document). For example, a Web news page is composed by text describing some event (e.g., a car accident) and a picture containing additional information regarding the real extent of the event (e.g., how damaged the car is) or providing evidence corroborating the text part. The framework handles multimedia
more » ... nformation by considering not only the document's text and images data but also the layout structure which determines how a given text block is related to a particular image. The novelties and contributions of the proposed framework are: (1) support of heterogeneous types of multimedia documents; (2) a documentgraph representation method; and (3) the computation of crossmedia correlations. Moreover, we applied the framework to the tasks of categorising Web news feed data, and our results show a significant improvement over a single-medium based framework.
doi:10.1145/1646396.1646431 dblp:conf/civr/IriaCM09 fatcat:2g6g6v7kdfbt7p5bewia55nosa