Finding high-quality content in social media

Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis, Gilad Mishne
2008 Proceedings of the international conference on Web search and web data mining - WSDM '08  
The quality of user-generated content varies drastically from excellent to abuse and spam. As the availability of such content increases, the task of identifying high-quality content in sites based on user contributions-social media sitesbecomes increasingly important. Social media in general exhibit a rich variety of information sources: in addition to the content itself, there is a wide array of non-content information available, such as links between items and explicit quality ratings from
more » ... mbers of the community. In this paper we investigate methods for exploiting such community feedback to automatically identify high quality content. As a test case, we focus on Yahoo! Answers, a large community question/answering portal that is particularly rich in the amount and types of content and social interactions available in it. We introduce a general classification framework for combining the evidence from different sources of information, that can be tuned automatically for a given social media type and quality definition. In particular, for the community question/answering domain, we show that our system is able to separate high-quality items from the rest with an accuracy close to that of humans.
doi:10.1145/1341531.1341557 dblp:conf/wsdm/AgichteinCDGM08 fatcat:utsxpt5find2dn6edtcnmnuipm