Minimally-Supervised Extraction of Entities from Text Advertisements

Sameer Singh, Dustin Hillard, Chris Leggetter
2010 North American Chapter of the Association for Computational Linguistics  
Extraction of entities from ad creatives is an important problem that can benefit many computational advertising tasks. Supervised and semi-supervised solutions rely on labeled data which is expensive, time consuming, and difficult to procure for ad creatives. A small set of manually derived constraints on feature expectations over unlabeled data can be used to partially and probabilistically label large amounts of data. Utilizing recent work in constraint-based semi-supervised learning, this
more » ... per injects light weight supervision specified as these "constraints" into a semi-Markov conditional random field model of entity extraction in ad creatives. Relying solely on the constraints, the model is trained on a set of unlabeled ads using an online learning algorithm. We demonstrate significant accuracy improvements on a manually labeled test set as compared to a baseline dictionary approach. We also achieve accuracy that approaches a fully supervised classifier.
dblp:conf/naacl/SinghHL10 fatcat:nls4wd6mu5c2hl62owh3jtekwm