Discovering political topics in Facebook discussion threads with graph contextualization

Yilin Zhang, Marie Poux-Berthe, Chris Wells, Karolina Koc-Michalska, Karl Rohe
2018 Annals of Applied Statistics  
We propose a graph contextualization method, pairGraphText, to study political engagement on Facebook during the 2012 French presidential election. It is a spectral algorithm that contextualizes graph data with text data for online discussion thread. In particular, we examine the Facebook posts of the eight leading candidates and the comments beneath these posts. We find evidence of both (i) candidate-centered structure, where citizens primarily comment on the wall of one candidate and (ii)
more » ... e-centered structure (i.e. on political topics), where citizens' attention and expression is primarily directed towards a specific set of issues (e.g. economics, immigration, etc). To identify issue-centered structure, we develop pairGraphText, to analyze a network with high-dimensional features on the interactions (i.e. text). This technique scales to hundreds of thousands of nodes and thousands of unique words. In the Facebook data, spectral clustering without the contextualizing text information finds a mixture of (i) candidate and (ii) issue clusters. The contextualized information with text data helps to separate these two structures. We conclude by showing that the novel methodology is consistent under a statistical model. and regression, Stieglitz and Dang-Xuan (2012) finds that political tweets on Twitter that contain stronger emotions receive more public interactions. There are also studies of how political sentiment on SNSs reflect the offline political landscape (Tumasjan et al., 2011) , and how it can affect political elections (Choy et al., 2011) . Apart from the topic or sentiment information, patterns of political discussion on SNSs are also of great theoretical and empirical interests to scholars of communication and political science. Such platforms have long been heralded for their potential to foster a "public sphere" in which ordinary citizens can recognize one another and hear reasons both for and against their own points of view (Papacharissi (2002) ). More recent analyses of online political discourse are less optimistic, identifying instead vitriol, "trolling", and larger patterns of partisan polarization. As a result, a great deal of research investigates the extent to which online actors are connected to political opponents (Adamic and Glance (2005) , Colleoni et al. (2014), Bakshy et al. (2015)) Another approach to understand structure of political discussions is social network analysis, which aims to identify influential political actors and communities in the discussions (Stieglitz and Dang-Xuan, 2012) and to study properties of the communities (Robertson et al. (2010) ) Gonzalez-Bailon et al. (2010) ). One popular community detection approach is spectral clustering (Von Luxburg, 2007) , which is fast, easy to implement, and consistent in block models for network (Holland et al. (1983) , Airoldi et al. (2008) , Qin and Rohe (2013) ). In this paper, we combine text mining and community detection to investigate the multiple dimensions of citizens' interactions with political content coming from political actors. In our data, which come from the 2012 French election, citizens commented on presidential candidate's Facebook posts. This creates a communication network between two types of units: (i) citizens and (ii) candidate-posts, as the eight presidential campaigns each has posts on Facebook, and citizens comment on the posts. This paper studies the structure of the resulting discussion threads. The activities of the citizens are characterized by (i) which of the candidate-posts they comment on and (ii) the text of their comments. We are interested in two broad types of patterns in these activities: (i) candidate-centered structure, where citizens primarily comment on the wall of one candidate; and (ii) issue-centered structure, in which citizens' attention and expression is directed towards a specific set of issues (e.g. economics, immigration, etc). To search for such patterns, we cluster the citizens based on their activities. In each cluster, we examine whether the activities of the citizens focus on particular candidates (i.e. candidate-centered)(Section 2.2) or whether the activities focus on certain political issues (i.e. issue-centered)(Section 4). This distinction reflects the possibility that the Facebook conversation might be organized more along lines of partisanship (candidate-centered), as opposed to matters of concern to "issue publics" (issue-centered) (Kim (2009)). There has been significant progress on both topic modeling for text (Blei, 2012) and community detection for social networks (Airoldi et al. (2008) ). Recently, there has been significant interest in clustering networks for which we have additional information on the citizens in networks (Chang and Blei (2010) ; Binkiewicz et al. (2017) ). In this paper, we extend these ideas to the setting of discussion threads. Our network is two-way or bi-partite, in which the two types of units, citizens and candidate-posts, are linked by commenting in a discussion thread. Below, we refer to the links showing which citizens commented on which candidate-posts as the network or the graph. We refer to both the text in candidate-posts and the text in citizen-comments as the text. The duality between citizens and candidate-posts also appears in the text; candidates say things differently from citizens. A key difficulty in analyzing this process, and the key methodological innovation of this paper, is FrenchElection. Incorporating the text makes the central conversations more vivid representations of the clusters, allowing for a more precise interpretation of the topic. During the 2012 French election, the citizens devoted their attention and expression in (i) the debates and fights among different candidates, (ii) radical discussions on Islam, religion, and immigration, and (iii) other topics including ecology, economy, and crises. Citizen-clusters Post-clusters Pro-Hollande. The central conversations are on Hollande's wall, which criticize Sarkozy or praise Hollande.
doi:10.1214/18-aoas1191 fatcat:5bpjowaa4jcotcu4cblgzlccay