Multi-Modal Multiple-Instance Learning with the application to the cannabis webpage recognition

Yinjuan Wang, Nianhua Xie, Weiming Hu, Jinfeng Yang
2011 The First Asian Conference on Pattern Recognition  
With the development of the World Wide Web, there exists more and more illicit drug Webpages. Thus, how to screen cannabis Webpages on the internet is a quite important issue. Conventional methods that only use the keyword-based or imagebased approaches are not sufficient. We propose a Multi-Modal Multiple-Instance Learning (MMMIL) approach combining both text and image information for cannabis webpage recognition. The main technical contributions of our work are two-fold. First, the text
more » ... ation associated with images is used to build a preclassifier, which can pre-select pseudo positive training bags from new Webpages to update multi-modal classifier. This can be seen as a pseudo active learning process. Second, we design an efficient instance selection technique by utilizing text information to speed up the training process without compromising the performance. The experiments on a dataset containing over 40,000 images for more than 4,000 Webpages demonstrate the effectiveness and efficiency of the proposed approach.
doi:10.1109/acpr.2011.6166680 dblp:conf/acpr/WangXHY11 fatcat:3tldu6rg3vdplcbczer5jkss5e