Fine-grained controversy detection in Wikipedia

Siarhei Bykau, Flip Korn, Divesh Srivastava, Yannis Velegrakis
2015 2015 IEEE 31st International Conference on Data Engineering  
The advent of Web 2.0 gave birth to a new kind of application where content is generated through the collaborative contribution of many different users. This form of content generation is believed to generate data of higher quality since the "wisdom of the crowds" makes its way into the data. However, a number of specific data quality issues appear within such collaboratively generated data. Apart from normal updates, there are cases of intentional harmful changes known as vandalism as well as
more » ... aturally occurring disagreements on topics which don't have an agreed upon viewpoint, known as controversies. While much work has focused on identifying vandalism, there has been little prior work on detecting controversies, especially at a fine granularity. Knowing about controversies when processing usergenerated content is essential to understand the quality of the data and the trust that should be given to them. Controversy detection is a challenging task, since in the highly dynamic context of user updates, one needs to differentiate among normal updates, vandalisms and actual controversies. We describe a novel technique that finds these controversial issues by analyzing the edits that have been performed on the data over time. We apply the developed technique on Wikipedia, the world's largest known collaboratively generated database and we show that our approach has higher precision and recall than baseline approaches as well as is capable of finding previously unknown controversies.
doi:10.1109/icde.2015.7113426 dblp:conf/icde/BykauKSV15 fatcat:nrioprclobbcrn6gk763mvkimy