Online political networks of Boulder, Part I: Friend networks
Hearst’s yellow journalism. FDR’s fireside chats. The Nixon-Kennedy debates. Obama’s 2008 campaign. New communication technologies shape American politics. Much of our attention focuses on candidates and campaigns at the national level, but the local politics of school boards, zoning commissions, health departments, and city councils have far greater impacts on our day-to-day lives even though their elections and appointments attract a tiny fraction of the attention of national campaigns. However, the role of social media in shaping local politics is an overlooked topic by social media researchers like myself.
The importance of understanding the intersection of social media and local politics has only grown since 2020. In the aftermath of both the pandemic and the 2020 election, local officials overseeing elections, schools, and public health have been targeted for implementing common-sense policies. Social media platforms play roles in amplifying information that motivates people to turn out to increasingly combative hearings. While Boulder’s demographics are hardly representative, Boulder sits within one of the most educated and connected counties in the U.S. and hosts growing offices for major tech firms like Google, Twitter, Amazon, and Apple. The influence of social media on local politics is therefore likely to be stronger in Boulder than in most places.
Although it is regularly ranked as one of the “best places to live” in the United States and has been a politically liberal enclave for decades, Boulder still grapples with polarizing issues around housing affordability, climate change adaptation, and racial justice that are likely to play an important role in shaping the structure of its local social media politics. Moreover, at least one city council candidate filed a civil complaint against community members alleging defamation because of their social media behavior: what happens on local politics Twitter does not stay on local politics Twitter. It also happens to be where I live and know something about the issues and actors to contextualize the patterns we will see in the data below.
Method
There are many social platforms like Facebook, Reddit, or Nextdoor where local politics play out, but Twitter is unique for at least two reasons. First, Twitter has more of the characteristics of a public forum in terms of convening audiences and amplifying messages. Local candidates use Twitter for official, public-facing functions like messaging, organizing, and engaging with supporters in ways that Facebook’s privacy settings, Reddit’s ephemerality, or Nextdoor’s hyperlocalness do not. Second, Twitter makes it easier for researchers like myself to collect data about relationships, content, and interactions than other online social platforms. Its API makes it relatively easy to retrieve public data like who-follows-whom and who-posts-what that reflects the kinds of social structures that influence users’ exposure and reaction to information.
The kind of analyses we will explore in this post draw on methods from social network analysis: understanding the relationships linking social actors together. I will use graph theory terms like “node” to refer to individual Twitter accounts and “edge” or “link” to refer to the connections between accounts. There are multiple kinds of connections we can derive from Twitter data like friends (the accounts I follow), mentions (the accounts I mention in my tweets), retweets (the accounts I amplify to my network), and replies (the accounts I reply to). Some of these connections are “stronger” in the sense that they are more likely to capture meaningful social relationships.
Before we capture a single bit of Twitter data, we need to think about defining the boundaries of Boulder’s political networks. Should we include people who post tweets geolocated within Boulder? Most users do not geolocate their tweets and those who do may be visitors whose behavior has weak political salience. People who use #boulder? Again, most online conversations and interactions about Boulder’s politics do not include the hashtag. Should we start with the accounts of state and national-level politicians? This would undermine the motivation to emphasize local politics.
Data collection
The first step of my data collection strategy starts with the nine current members of the Boulder City Council and the ten council candidates for the 2021 election. Three of the current members (Nagle, Wallach, Yates) and five of the candidates (Christy, Decalo, Rosenblum, Takahashi, Wallach again) had no publicly-accessible or discernible Twitter presence. This leaves the six accounts of current council members (Aaron Brockett, Rachel Friend, Junie Joseph, Adam Swetlik, Sam Weaver, Mary Young) and the five accounts of candidates (Matt Benjamin, Lauren Folkerts, Nicole Speer, Dan Williams, Tara Winer). I am going to call these 11 council members and candidates with publicly-accessible Twitter accounts our “seed” accounts. Caveat scholar: my Twitter data collection strategy is blind to the political behavior and relationships involving almost half the population (8/19) of council members and candidates.
The second step of my data collection strategy is to retrieve the “friends” of these eleven seed accounts. “Friend” is what Twitter calls the accounts followed by an account: the “followees” rather than “followers”. Twitter users have little control over who follows them, but complete control over who they follow. The accounts a Twitter user chooses to follow says much more about the the kinds of content they want to see in their feeds as well as basic social norms like reciprocity: if you follow me, I will follow you back. Twitter’s API allows you to retrieve the “friends” of any public account. Our 11 seed accounts follow 2,886 unique accounts. These 2,886 friends include other seeds, local Boulder personalities and organization, as well as users’ specific interests and backgrounds.
The third step of my data collection strategy involves filtering these 2,886 accounts down. The Twitter API for retrieving this friend data is (purposefully) slow: approximately one account per minute. Retrieving the “friend” relationships for all 2,886 friends would therefore take at least 2,886 minutes or more than 48 hours. More than 2,194 of the 2,886 friends are only followed by one seed while 28 friends are followed by 8 of the seed accounts. I decided to only include the 692 friends who are followed by at least two of the eleven seed accounts. In other words, our network only includes the 692 Twitter accounts who are followed by at least two Boulder City Council members or candidates. By way of disclosure, my personal account is one of these 692 account (four council members and two candidates follow me).
The third step of my data collection strategy requires filtering these 692 doubly-nominated friends down more. Some of these accounts include national politicians like Barack Obama, state politicians like Jared Polis, and journalists like Kyle Clark who have tens or hundreds of thousands of “friends”. Again, because Twitter only lets us collect 5,000 friends of an account per request, getting just Barack Obama’s almost 600,000 friends would’ve taken two hours and only the smallest fraction of them would be relevant to Boulder politics. Not a great use of my time or the pollution I am causing Twitter’s data centers to generate. Based on the distribution of the 692 seeds’ friends’ friend counts, I elected to only get the friends-of-friends (more on this next) for accounts that have up to 25,000 friends. This eliminates four high-friend accounts (barackobama, trish_zornio, 505nomad, and amyklobuchar) and leaves us with 688 friends, including many high-friend accounts like Governor JaredPolis, Daily Camera journalist MitchellByars, and CU professor RogerPielkeJr.
The fourth step is to see which of the seeds’ friends accounts are friends with each other. For example, JaredPolis is not on Boulder’s City Council, but as the city’s congressional representative for more than a decade and now the state’s governor, his account is likely friends with many of the other friends of our 11 seeds. With our 688 “doubly-nominated but not-too-many-friends” accounts surrounding our 11 seed accounts, we can get their friends as well. Because of the Twitter API’s rate limits, getting these 688 friends’ friends will take at least 688 minutes, or over 11 hours. Definitely the kind of thing you want to leave running overnight or in the background! Our 688 friends have 116,947 unique friends-of-friends, including seeds and friends.
The fifth step is to filter the friends and friends-of-friends data. Because we are interested in the relationships among the 692 friends accounts surrounding our 11 seeds accounts, we ignore any friend-of-friend relationships unless both accounts are in the 692 friends. In the parlance of social network analysis, this is known as a “1.5-step ego network” around our 11 seed nodes. From the 116,947 friends-of-friends relationships, there are 63,697 friends-of-friends relationship among the 692 friends. The missing 53,250 relationships involved friends-of-friends who were not among the 692 friends of the seeds.
Network construction and visualization
We have our 692 accounts and the 63,697 “friend” relationships that connect them. For a social network, this is unusually dense: 13.7% of the friend ties that could exist actually do exist. 13 percent sounds low, but many social networks have densities below 1% meaning Boulder’s local political network is about 10 times denser than I naively expected. This density is likely a product of a few things: our network sampling returns the dense subset of a larger and sparser network, social processes like homophily creates more ties than among random strangers, and adding one more “friend” ties on Twitter has a negligible cost compared to adding additional offline friendship ties.
This density also introduces a problem for visualizing our network. This network was laid out using a “spring embedding” algorithm called ForceAtlas2 using the Gephi network visualization package. This is not a scatterplot where each node’s position on the x and y axes encode some meaningful information. Instead, nodes are attracted to each other if they share ties and repulsed if they do not with the goal that the algorithm cluster nodes that share many links and separate nodes that share few links. There are so many friend relationships that we have a “hairball” where there is no discernible local structure and every account appears to be connected to every other account (even though only 13% of these possible ties actually exist). We can still compute network statistics (more on this later) on this dense network to identify key players, but it remains aesthetically challenging as something to engage and think through and unclear whether this network captures anything meaningful about local politics.
Pruning hairballs for the purposes of visualization is a common practice among network scientists. The goal is to reduce the number of connections in the network in some consistent and justifiable way while hopefully preserving as many of the nodes as possible. I hope future blog posts will explore how these friend connections are a substrate over which other kinds of interactions occur, but that will require collecting still more data.
Let’s focus instead on the data we have with these binary friend connections: an account follows another account or they don’t. We have a ready intuition that some of these ties must be “stronger” than others in some kind of way. We can impute that some connections are stronger if there are more connections around them: conformity, similarity, exchange are all mechanisms that make it hard for you not to be friends with someone all your friends are friends with. If Aaron Brockett’s friends are all friends with Sam Weaver on Twitter, then Brockett and Weaver are very likely to also be friends on Twitter. The strength of this relationship can be coarsely quantified with the Jaccard index: the number of their friends in common divided by their total number of friends. For Brockett and Weaver in our Boulder politics network, their Jaccard index is 0.416: 41.6% of their friends are friends with each other.
We can compute theses scores for every friend relation in the network to give us a measure of strength. We can actually compute two Jaccard scores: the “in-Jaccard” captures the overlap in followers and the “out-Jaccard” captures the overlap in friends. Jared Polis and I have an “out-Jaccard” score of 0.268 (26.8% of our friends overlap in the network) and an “in-Jaccard” score of 0.153 (15.3% of our followers overlap in the network). The distribution of these scores is very positively skewed, meaning most relationships in the network have relatively low Jaccard scores.
Now that we have these scores, we can use them to filter the network to only the “strongest” ties. More aggressive thresholding reveals clearer structures while dropping more nodes while less aggressive thresholding preserves more nodes but has a less clear structure. I chose a completely arbitrary threshold of 0.33 for the “out-Jaccard” scores as the boundary where strong ties begin. Connections where at least one third of each’s friends are not friends themselves get discarded: more than 90% of the connections in the network. My Twitter relationship with Governor Polis is no more.
This filtered network also lost 57% of its nodes: these nodes simply didn’t have any friend connections with out-Jaccard scores (strong friends in common) above 0.33 and so they were removed. This includes accounts like potus, michelleobama, mitchellbyars, and ericmbudd. Switching to thresholding to in-Jaccard scores (strong followers in common) above 0.33 keeps some of these salient nodes and drops others. Either strategy keeps these top 10% strongest overlapping friend relationships gives us a window into the “backbone” of the absolute strongest connections in the network, one that has is much more interpretable that the original hairball, but with trade-offs like removing half of the accounts.
Results
Visualization
There are two metrics for pulling out the most “important” ties in the network for the purposes of visualization: having many followers in common (in-Jaccard) and having many friends in common (out-Jaccard). I have encoded additional data through various visual channels in this visualization. In addition to using the same ForceAtlas2 algorithm to lay out the filtered network, the nodes are sized by the number of connections they receive from other nodes. This is known as “in-degree centrality” among network scientists and you can think of it as popularity: many other accounts want to connect to this account. I have also colored the nodes based on their modularity class. Modularity-based community detection algorithms define a community boundary where there are more connections among the nodes within the boundary than across the boundary.
For the out-Jaccard thresholded network, the community detection algorithm identifies six different communities in the filtered network. The assignment of nodes to these communities is stochastic, different runs of the algorithm might return different community assignments. The green community on the left side corresponds with accounts associated with newer and more progressive activists and candidates. The blue community in the center corresponds with established Boulder-based politicians like Governor Polis and former Speaker KC Becker. The black community at the bottom are accounts associated with community organizations like boulderpolice, bouldercolorado, and boulderchamber. The thin line of orange-colored accounts are journalists. The pink community are state-level politicians and journalists like coloradodems or jenagriswold. The red nodes in the upper-right are national-level politicians and officials like aoc and ossoff.
For the in-Jaccard thresholded network, the community detection algorithm picks up six-ish communities. As before, there’s a group of national-level politicians (upper-right in red), a group of state-level politicians (top in green), and group of city-level organizations (bottom in pink). The community that previously had journalists in it was merged with some of the established local accounts (center in orange) and the progressive-activist sub-community remains distinctive. The algorithm identifies also a new sub-community in orange composed of less ideological activists (bottom-left in yellow). Because this community detection algorithm is stochastic, different runs will return different community structures and assignments of accounts to communities. The modularity score is 0.167, which is low and indicates the algorithm is struggling to find easily separable sub-communities.
Key players
The Jaccard filtering and visualizations above are helpful in exploring the different sub-communities hidden within the tangle of original data. But this filtering and visualization exercise also removed approximately 50% of the nodes in the network with no other connections. Throwing away half the data for the sake of aesthetics is hard to justify, especially in a context like local politics where the behavioral traces of attention and engagement are already thin. Shifting back to the complete dataset without any filtering, we can use some of the classic centrality metrics in network science to identify the “key players” in the friend network. There are many ways we can define “key players” within this network based on the patterns of Twitter friendship connections.
Degree centrality (Wikipedia) counts the number of connections each account has to the other accounts in the network. We can think of this as popularity. In a directed network, there are two kinds of degree centrality. In-degree centrality is the number of followers (connections received) the account has in this network and out-degree centrality is the number of friends (connections sent) the account has in the network. Council member Aaron Brockett is the most well-connected account by both metrics, with him following (out-degree) 565 of the 692 accounts in the network and being followed by (in-degree) 529 of the 692 accounts in the network. Other top accounts by these metrics include activist groups like boulderbedrooms, newr_boulder, abetterboulder, and boulderprogress; accounts for newspapers and journalists like the Daily Camera, Ryan Warner, Mitchell Byars, Shay Castle, and Alex Burness.
Betweenness centrality (Wikipedia) counts the number of shortest paths between every pair of accounts in the network passing through a given account. We can think of this as brokerage. Accounts with high betweenness centrality scores are the bridges that connect the network together. Some of the key players by this metric were also key players by centrality: having many connections makes it more likely that you broker between groups. Council member Junie Joseph appears in this list although she wasn’t among the top degree central accounts, suggesting she connects to parts of the network together despite having fewer connections.
Closeness centrality (Wikipedia) counts the proximity of an account to every other account in the network. We can think of this as pulse-taking. Accounts with high closeness centrality scores are in a better position to both listen to and influence the network because their messages can reach everyone more quickly. While distinct, closeness centrality tends to be correlated with degree centrality and our top accounts have some familiar names. Notably, the official accounts for the Daily Camera, the City of Boulder, Governor Polis’s personal account, and the official governor’s account all rank highly. It is unsurprising (given our data collection strategy) but promising that the online network of Boulder’s local politics centers on the accounts of elected officials, journalists, and official accounts.
Eigenvector centrality (Wikipedia) is a recursive measure of importance: an account is important if it is connected to other important accounts, whose importance is determined by their neighbors, and so on. We can think of this as an influencer. Again, while this is a distinct metric it tends to be highly correlated with metrics like degree which is why some of the same names come up here as in other metrics. Some new entrants here include former Colorado House Speaker KC Beckett and the official account for Boulder County. Their presence in this list and not the others could reflect a greater selectivity in their Twitter following behavior by only connecting with fewer but more important accounts.
Discussion
From an “information hygiene” perspective, I feel a strong sense of reassurance about the structure of Boulder’s local political network. The network strongly connected together with little superficial polarization. Different sub-communities are represented on closer inspection and the composition and relationship between these sub-communities captures a cross-section of local political life. The key players in the network are likewise politicians, established journalists, official accounts, and mainstream activists rather than disinformation peddlers and rabble-rousing extremists we hear about elsewhere. Do these results give us a clear path out of the thicket of issues our community faces around house affordability, climate change adaptation, racial justice, and pandemic response? I don’t think those answers lie in these data, but I remain heartened by the potential for using social technologies to convene more responsive conversations and organize political action about how national and global issues are playing out at the local level.
The data presented here are not neutral: they are fundamentally imbued by decisions I made from data collection through visualization and analysis to include some accounts and not others. Almost 50% of city council members and candidates are not even in this analysis because they are not on Twitter. The visualizations removed another 50% of the accounts that were present because they lacked the strongly-embedded ties to other accounts in this sample. The strongly-tied communities in the visualizations or top-connected accounts in the key player analysis only generalize as far as Twitter behavior generalizes; which is to say, not far beyond Twitter itself. What plays out in the letters to the Camera and what happens on NextDoor, Facebook, or r/Boulder are likewise not captured here. Is Council member Aaron Brockett the king-maker in Boulder or simply a representative who is active on Twitter and is reciprocally follows his constituents?
This is only a snapshot in September 2021 of these Twitter friend networks, which we should expect will continue to change as new issues, activists, journalists, and politicians enter the arena. These following relationships are also “inexpensive” and “thin” in the sense that an account’s choice to follow another is the product of multiple motivations. More substantive relationships like who-mentions-whom, who-replies-to-whom, who-retweets-whom, who-hashtags-what, etc. may give more substantive insights into the particular dynamics of engagement or polarization around different issues. I’ll leave that for a future blog post!
Appendix
The code and data for this analysis are available from this GitHub repository. Pull requests are welcome!
Please note that you will need to register your own Twitter application to retrieve the data from the API.
If you want to make pretty hairball network visualizations like those above, download the “boulder_politics.gexf” file from GitHub and load it into Gephi. (Windows users will need to do some extra work.)