Product name recognition and normalization in internet forums [thesis]

Yangjie Yao
Collecting users' feedback on products from Internet forums is challenging because users often mention a product with informal abbreviations or nicknames. In this paper, we propose a method named Gren to recognize and normalize mobile phone names from Internet forums. Instead of directly recognizing phone names from sentences as in most named entity recognition tasks, we propose an approach to first generating candidate names. The candidate names capture short forms, spelling variations, and
more » ... knames of products, but are not noise free. To predict whether a candidate name mention in a sentence indeed refers to a specific phone model, a CRF-based name recognizer is developed. The CRF (Conditional Random Field) model is trained by using a large set of sentences obtained in a semi-automatic manner with minimal manual labeling effort. Lastly, a rule-based name normalization component maps a recognized name to its formal form. Evaluated on more than 4000 manually labeled sentences with about 1000 phone name mentions, Gren achieves precision and recall of 0.918 and 0.875 respectively, with the best feature setting.
doi:10.32657/10356/61814 fatcat:jvufsos5fff6rglivtcz3e34zq