“Family trees” of online communities

Genealogical relationships among a sample of subreddits.
  1. Characterizing genealogical graphs. What is a genealogical relationship in an online community? We have a preliminary quantitative method for identifying parent-child relationships based on the temporal sequences in their users’ public activity logs and propose further extensions to the method and applications to platforms like Reddit and Wikipedia.
  2. Validating genealogical graphs. Does our genealogical construct capture substantive relationships between online communities? We will employ a battery of mixed methods approaches such as trace ethnography, trace interviews, and focus groups to validate the genealogical relationship constructs. This triangulation step will elicit alternative definitions of genealogies, produce labeled data, and identify outliers that will require induction and iteration to generate more robust constructs.
  3. Evaluating community processes. How do genealogical relationships explain community processes? We analyze how processes like growth and norms are influenced by genealogical relationships. We propose to examine how genealogical graphs relate to community success through a prediction framework and study the effectiveness of features based on genealogical graphs.

Previous findings

Chenhao and I started to collaborate on this project around our shared interests in understanding the social dynamics of online communities through a sequence analysis perspective.

Ethical and privacy considerations

The proposed research necessarily involves tracking user activity across contexts, which raises important ethical and privacy concerns. First, users maintain different identities to different groups but research designs can collapse these contexts together and upsets users’ imagined audiences. Second, just because users’ trace data are accessible through public APIs does not automatically exempt it from ethical concerns. Third, while the policies governing ethical review boards in the United States interpret digital trace data as less risky to participants than other research designs, our colleague Casey Fiesler has done research documenting how social media users express reservations about their content being used for research.

Engaging community members

We plan to run interviews and focus groups with moderators, administrators, and other leaders of the sub-communities we analyze. While Wikipedia has a regular community gathering (Wikimania), there is no analogous “RedditCon”. The closest things are Content Moderation at Scale, but if you are aware of any conferences, workshops, panels, etc. where Reddit moderators gather in person, please get in touch!

Recruiting a post-doctoral research associate

As a part of this grant, we are looking to recruit a post-doc for up to two years. The ideal candidate will be familiar with the history and culture of Reddit and/or Wikipedia, want to develop skills in computational, quantitative, and qualitative research methods, and help to shape the research agenda of human-centered data science. Interested applicants should apply here.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Brian Keegan

Brian Keegan


{Social, Data, Network, Information} Scientist. @CUInfoScience assistant professor.