Learning to Extract Folktale Keywords

Dolf Trieschnigg, Dong Nguyen, Mariët Theune
2013 Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities  
Manually assigned keywords provide a valuable means for accessing large document collections. They can serve as a shallow document summary and enable more efficient retrieval and aggregation of information. In this paper we investigate keywords in the context of the Dutch Folktale Database, a large collection of stories including fairy tales, jokes and urban legends. We carry out a quantitative and qualitative analysis of the keywords in the collection. Up to 80% of the assigned keywords (or a
more » ... inor variation) appear in the text itself. Human annotators show moderate to substantial agreement in their judgment of keywords. Finally, we evaluate a learning to rank approach to extract and rank keyword candidates. We conclude that this is a promising approach to automate this time intensive task.
dblp:conf/latech/TrieschniggNT13 fatcat:sdtzxlpajnfq7h2bh7af6vyqtq