Leveraging the Crowd to Detect and Reduce the Spread of Fake News and Misinformation

Jooyeon Kim, Behzad Tabibian, Alice Oh, Bernhard Schölkopf, Manuel Gomez-Rodriguez
2018 Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining - WSDM '18  
Online social networking sites are experimenting with the following crowd-powered procedure to reduce the spread of fake news and misinformation: whenever a user is exposed to a story through her feed, she can flag the story as misinformation and, if the story receives enough flags, it is sent to a trusted third party for fact checking. If this party identifies the story as misinformation, it is marked as disputed. However, given the uncertain number of exposures, the high cost of fact
more » ... st of fact checking, and the trade-off between flags and exposures, the above mentioned procedure requires careful reasoning and smart algorithms which, to the best of our knowledge, do not exist to date. In this paper, we first introduce a flexible representation of the above procedure using the framework of marked temporal point processes. Then, we develop a scalable online algorithm, Curb, to select which stories to send for fact checking and when to do so to efficiently reduce the spread of misinformation with provable guarantees. In doing so, we need to solve a novel stochastic optimal control problem for stochastic differential equations with jumps, which is of independent interest. Experiments on two real-world datasets gathered from Twitter and Weibo show that our algorithm may be able to effectively reduce the spread of fake news and misinformation. * This work was done during Jooyeon Kim's internship at the Max Planck Institute for Software Systems. 1 https://www.washingtonpost.com/posteverything/wp/2016/06/16/why-the-post-truth-political-era-might-be-around-for-a-while/ 2 https://www.theguardian.com/commentisfree/2016/may/13/boris-johnson-donald-trump-post-truth-politician 3 https://newsroom.fb.com/news/2016/12/news-feed-fyi-addressing-hoaxes-and-fake-news/ 4 https://www.washingtonpost.com/news/the-switch/wp/2017/06/29/twitter-is-looking-for-ways-to-let-users-flag-fake-news/ 5 their feeds, they have a choice to flag the story as misinformation and, if the story receives enough flags, it is directed to a coalition of independent organizations 6 , signatories of Poynter's International Fact Checking Code of Principles 7 , for fact checking. If the fact checking organizations identify a story as misinformation, it gets flagged as disputed and may also appear lower in the users' feeds, reducing the number of people who are exposed to misinformation. In this context, online social networking sites are giving advice to its millions of users on how to spot misinformation online 8 . However, the above mentioned procedure requires careful reasoning and smart algorithms which, to the best of our knowledge, are nonexistent to date: -Uncertain number of exposures: the spread of information over social networking sites is a stochastic process, which may depend on, e.g., the information content, the users' influence and the network structure. Thus, the number of users exposed to different stories varies greatly and we need to consider probabilistic exposure models to capture this uncertainty. -Fact checking is costly: given the myriad of (fake) stories spreading in online social networking sites and the observation that fact checking is a costly process, we can only expect (the reviewers from) the coalition of independent organizations to fact check a small percentage of the set of stories spreading over time. Therefore, it is necessary to decide which stories to fact check and when to do so. -Flags vs exposures: the more users are exposed to a story before sending it for fact checking, the greater the confidence a story may be misinformation, however, the higher the potential damage if it turns out to be misinformation. Thus, we need to find the optimal trade-off between misinformation evidence, by means of flagging data, and misinformation reduction, by means of preventing (unwarned) exposures to misinformation. Our approach. To tackle the above challenges, we first introduce a novel representation of the above procedure using the framework of marked temporal point processes [1] . Then, we find which stories to send for fact checking by solving a novel stochastic optimal control problem for SDEs with jumps [17], which differs from the nascent literature on stochastic optimal control of social and information systems [49, 43, 48, 42] in two technical aspects: I. The control signal is a multidimensional survival process (i.e., a terminating temporal point process), which is defined by means of a set of conditional intensities (i.e., stories to fact check), while previous work has considered nonterminating temporal point processes as control signals.
doi:10.1145/3159652.3159734 dblp:conf/wsdm/KimTOSG18 fatcat:skhdulbzjnfx7pgjg6svt3mlla