Automatic Entity Recognition and Typing in Massive Text Corpora

Xiang Ren, Ahmed El-Kishky, Chi Wang, Jiawei Han
2016 Proceedings of the 25th International Conference Companion on World Wide Web - WWW '16 Companion  
In today's computerized and information-based society, we are soaked with vast amounts of natural language text data, ranging from news articles, product reviews, advertisements, to a wide range of user-generated content from social media. To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of entities and the relationships between them. In this tutorial, we introduce data-driven methods to recognize typed entities of
more » ... erest in different kinds of text corpora (especially in massive, domain-specific text corpora). These methods can automatically identify token spans as entity mentions in text and label their types (e.g., people, product, food) in a scalable way. We demonstrate on real datasets including news articles and yelp reviews how these typed entities aid in knowledge discovery and management.
doi:10.1145/2872518.2891065 dblp:conf/www/RenEWH16 fatcat:nhwxdfwwpbdgpm2mhuitgnmtkm