A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2010; you can also visit the original URL.
The file type is
Conventional image categorization techniques primarily rely on low-level visual cues. In this paper, we describe a multimodal fusion scheme which improves the image classification accuracy by incorporating the information derived from the embedded texts detected in the image under classification. Specific to each image category, a text concept is first learned from a set of labeled texts in images of the target category using Multiple Instance Learning  . For an image under classificationdoi:10.1145/1180639.1180698 dblp:conf/mm/ZhuYC06 fatcat:gfnskh6brncl7fz3sva3ja2ag4