Automatic Class Labeling for CiteSeerX

Surya Dhairya Kashireddy, Susan Gauch, Syed Masum Billah
2013 2013 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)  
The CiteSeer x project at the University of Arkansas uses a browsing interface is based on the Association for Computing Machinery's Computing Classification System (ACM CCS). CCS contains just 369 categories whereas the CiteSeerx database contains over 2 million documents. This results in more than 6500 documents per category, far too many to browse. To address this problem, we are exploring ways to automatically expand the CCS ontology. Previous work has focused on using clustering to
more » ... cally identify the new clas-ses. This work focuses on how to label the subclasses in a semantically meaningful way to that they can sup-port user browsing. We develop methods based on text mining from the subclass members to extract class la-bels. We evaluate three methods by comparing the suggested labels with human-assigned labels for exist-ing categories.
doi:10.1109/wi-iat.2013.35 dblp:conf/webi/KashireddyGB13 fatcat:7kb4ypr33je2fhqboeitrju3z4