In our continuing project of describing the social dynamics of the large online communities of Reddit, our research team has thought about growth and its effects on online communities in a variety of ways. We’ve always been interested in the ‘communitiness’ of larger groups online. In our 2018 article published in Social Media + Society, we make the argument that for an online group to be considered a community, a large number of its members must actively participate over an extended period of time. The greater the proportion of a group’s members actively participate, and the longer those users participate, the more community-like the group is. It is through sustained, widespread participation, we assert, that the bonds of community form.
MEASURING ‘STICKINESS’
Measuring how long members of an online group actively participate as a collective be a bit tricky. We can imagine one group where a very high percentage of members stick around for a very long time. We might think of this as a ‘cohesive’ group, one with very ‘loyal’ members. This group is, in the parlance of marketing, ‘sticky’ in the sense that upon joining the group, individuals tend to stick there and stay for a long time. Conversely, we can imagine a ‘revolving door’ group in which individuals come and go and rarely stick around.
One thing that makes measuring a subreddit’s ‘stickiness’ (or, if you like, churn, turnover, loyalty, cohesion, etc.) tricky is the dynamism of most subreddits. Over time, we find that the vast majority of contributors to comments within a subreddit’s discourse leave and don’t come back. These individuals are replaced by new entrants, and the community, as a whole, continues to flourish even as its contributing members are continuously swapped out (much the same way cells in certain parts of the human body are continually swapped out for new ones in a constant process of renewal). I surmise this is the case with most online communities, given the ease with which individuals can come and go, and the lack of clear incentives for sticking around.
Simply looking at the average times between individuals’ first and last contributions would give us some clue as to how sticky a subreddit is: stickier subreddits would have a higher average than less sticky subreddits. But this approach fails to account for the dynamic nature of subreddits. In our exploration of the history of r/TwoXChromosomes, we found that events inside and outside of Reddit can dramatically affect a subreddit’s stickiness. We found it better to try to capture a subreddit’s rate of turnover (or, if you like, retention) as it changed over time.
Our imperfect solution was to measure turnover from month to month within subreddits. How many active contributors from one month came back and commented the next month? In a sticky subreddit, many contributors would come back the following month, whereas in a less sticky subreddit, very few would return. But what about commenters who happened to take the next month off, but who returned the following month and gradually became very loyal contributors to the subreddit’s discourse?
One thing that makes measuring a subreddit’s ‘stickiness’ (or, if you like, churn, turnover, loyalty, cohesion, etc.) tricky is the dynamism of most subreddits. Over time, we find that the vast majority of contributors to comments within a subreddit’s discourse leave and don’t come back. These individuals are replaced by new entrants, and the community, as a whole, continues to flourish even as its contributing members are continuously swapped out (much the same way cells in certain parts of the human body are continually swapped out for new ones in a constant process of renewal). I surmise this is the case with most online communities, given the ease with which individuals can come and go, and the lack of clear incentives for sticking around.
Simply looking at the average times between individuals’ first and last contributions would give us some clue as to how sticky a subreddit is: stickier subreddits would have a higher average than less sticky subreddits. But this approach fails to account for the dynamic nature of subreddits. In our exploration of the history of r/TwoXChromosomes, we found that events inside and outside of Reddit can dramatically affect a subreddit’s stickiness. We found it better to try to capture a subreddit’s rate of turnover (or, if you like, retention) as it changed over time.
Our imperfect solution was to measure turnover from month to month within subreddits. How many active contributors from one month came back and commented the next month? In a sticky subreddit, many contributors would come back the following month, whereas in a less sticky subreddit, very few would return. But what about commenters who happened to take the next month off, but who returned the following month and gradually became very loyal contributors to the subreddit’s discourse?
THE COHORT APPROACH
One way to account for this would be to look at commenter ‘cohorts’ over time. We can think of individuals who started contributing to a subreddit in a given month as a ‘cohort’ and follow that cohort over time. If 1,000 people started commenting in r/science in August 2019, how many of them come back in September, October, November, etc.?
And so we developed a tool for analyzing commenter cohort retention in a given subreddit. We’ve been testing it out on a few subreddits, starting with r/TwoXChromosomes. r/TwoXChromosomes was an interesting test case in part because of its significant fluctuations in size. When it was added to the default list of subreddits shown to new users, its population of contributors trebled overnight. After a period of political foment leading up to the Women’s March on Washington D.C. in January 2017, the subreddit rapidly shrank. A similarly dynamic subreddit has been r/aetheism, which flourished in its fifth and sixth year of existence and peaked with a cohort size of around 32,000 in March 2012 before shrinking to cohorts of around 5,000-6,000 in subsequent years.
As mentioned at the start of this post, we’ve always been interested in growth: how communities handle it, how it affects the characters of communities. This tool allows us to examine how growth affects subreddits’ stickiness. Larger cohorts suggest a kind of robustness: if a group attracts more and more new contributors, it seems to be flourishing! But what if those new contributors don’t stick around? It’s easy to imagine larger cohorts as somehow more ‘casual’ in their connection to the group, while smaller cohorts might be more passionate or interested in the group and thus stick around longer. So, I had a hunch that the larger the cohort, the less likely contributors would be to stick around.
Of course, if cohorts were sufficiently large (30,000, as opposed to 3,000), the raw number of contributors who stuck around would be higher than in smaller cohorts. Even if only 5% of a 30,000 person cohort stuck around (1,500), this would be a larger addition to the number of loyal contributors to a subreddit than would be yielded by a 3,000 person cohort that retained 40% of its contributors (1,200). What we were more interested in was the percentage of the cohort that were retained over time, rather than the raw number retained.
The first thing we might focus on, for simplicity’s sake, is retention the month immediately following the first month of each cohort. If 1,000 people start contributing comments to a subreddit in August 2018, what percentage of them come back and contribute in September 2018? Below is an interactive chart that displays subreddit cohort size on the X axis and cohort retention percentage on the Y axis. We have several subreddits’ worth of data to look at (click on the boxes next to each subreddit to show or hide them).
And so we developed a tool for analyzing commenter cohort retention in a given subreddit. We’ve been testing it out on a few subreddits, starting with r/TwoXChromosomes. r/TwoXChromosomes was an interesting test case in part because of its significant fluctuations in size. When it was added to the default list of subreddits shown to new users, its population of contributors trebled overnight. After a period of political foment leading up to the Women’s March on Washington D.C. in January 2017, the subreddit rapidly shrank. A similarly dynamic subreddit has been r/aetheism, which flourished in its fifth and sixth year of existence and peaked with a cohort size of around 32,000 in March 2012 before shrinking to cohorts of around 5,000-6,000 in subsequent years.
As mentioned at the start of this post, we’ve always been interested in growth: how communities handle it, how it affects the characters of communities. This tool allows us to examine how growth affects subreddits’ stickiness. Larger cohorts suggest a kind of robustness: if a group attracts more and more new contributors, it seems to be flourishing! But what if those new contributors don’t stick around? It’s easy to imagine larger cohorts as somehow more ‘casual’ in their connection to the group, while smaller cohorts might be more passionate or interested in the group and thus stick around longer. So, I had a hunch that the larger the cohort, the less likely contributors would be to stick around.
Of course, if cohorts were sufficiently large (30,000, as opposed to 3,000), the raw number of contributors who stuck around would be higher than in smaller cohorts. Even if only 5% of a 30,000 person cohort stuck around (1,500), this would be a larger addition to the number of loyal contributors to a subreddit than would be yielded by a 3,000 person cohort that retained 40% of its contributors (1,200). What we were more interested in was the percentage of the cohort that were retained over time, rather than the raw number retained.
The first thing we might focus on, for simplicity’s sake, is retention the month immediately following the first month of each cohort. If 1,000 people start contributing comments to a subreddit in August 2018, what percentage of them come back and contribute in September 2018? Below is an interactive chart that displays subreddit cohort size on the X axis and cohort retention percentage on the Y axis. We have several subreddits’ worth of data to look at (click on the boxes next to each subreddit to show or hide them).
As you can see, the general pattern is that when subreddit cohorts are smaller, a larger percentage of the cohort is retained. However, there does appear to be significant variation, both within subreddits and across subreddits. There are some cohorts of 20,000 or 30,000 people in which over 30% are retained in the following month! It’s also clear from this chart that some subreddits are stickier than others, regardless of cohort size. Contributors to r/politics and r/aetheism appear to be much more likely to contribute in the month following their first contribution than contributors to r/science. This will make sense to anyone familiar with these subreddits: r/science tends to host discussions that relate to many different sub-topics, not all of which interest the same people. Conversely, conversations on r/politics or r/aetheism, though they may also relate to a variety of sub-topics, are sufficiently interesting to a larger portion of active contributors that a higher percentage of them continually chime in.
But what about retention after that second month? As a cohort ages, fewer and fewer of its members tend to return. We’ve found this to be the case in pretty much every cohort we’ve looked at (over 2,000 of them) in a variety of subreddits over a decade or so. However, this process of ‘cohort decay’ might happen at a faster or slower rate, depending on a variety of factors.
How might we visualize the ‘decay’ of a given cohort over time? One way would be to represent each month subsequent to a cohort’s first month with a point on our chart. As a cohort’s retention rate declines, it might be visualized in this way:
But what about retention after that second month? As a cohort ages, fewer and fewer of its members tend to return. We’ve found this to be the case in pretty much every cohort we’ve looked at (over 2,000 of them) in a variety of subreddits over a decade or so. However, this process of ‘cohort decay’ might happen at a faster or slower rate, depending on a variety of factors.
How might we visualize the ‘decay’ of a given cohort over time? One way would be to represent each month subsequent to a cohort’s first month with a point on our chart. As a cohort’s retention rate declines, it might be visualized in this way:
As the cohort ages, the percent of contributors retained declines. The above visualization shows a pretty gradual decline, whereas the visualization below shows a more rapid decline:
In this way, we can visualize the decay of each cohort in each subreddit over time. In the chart below, we show the cohort decay in five subreddits. We followed each cohort for 50 months after their inceptions (admittedly an arbitrary number of months). In subreddits that have been around a long time, we were able to show more cohorts (146 cohorts in the case of r/science). In subreddits that were created more recently, we could only depict relatively few cohorts (58 cohorts in the case of r/TwoXChromosomes).
If we compare certain subreddits, we find some interesting patterns. For example, if you look at r/science and r/personalfinance (labeled here as ‘PFINANCE’), you see that when cohorts are smaller, r/science tends to retain a higher percent of contributors and decay slower than cohorts in r/personalfinance. But as cohorts grow to between 10,000 and 15,000, it’s the r/personalfinance cohorts that are retaining more contributors and decaying at a slower rate. This suggests that larger r/science cohorts are comprised of a larger proportion of transient contributors than is the case with comparable cohorts in r/personalfinance. For r/science, growth is antithetical to communitiness in a way that it is not for other subreddits.
Of course, we’re in the realm of conjecture at this point, just testing out our tool and seeing what types of insights it might yield. We continue to build our repository of data, and in doing so, hope to reveal more of the underlying dynamics of online communities.
Of course, we’re in the realm of conjecture at this point, just testing out our tool and seeing what types of insights it might yield. We continue to build our repository of data, and in doing so, hope to reveal more of the underlying dynamics of online communities.
Thanks to Felipe Hoffa (u/fhoffa) for making Reddit comment data available for analysis using Google Query, and thanks to Jamie Witter for Tableau advice. Data analyzed by Wyatt Harrison, Jue Hou, and Elliot Panek at the University of Alabama.