This is the third in a series of five posts about Reddit and Analytics. The complete thread will be posted at the end of the week.
Previously – we covered the nature of the dataset, read histograms, and generated our segments. Now we’re going to examine each segment individually.
There are very efficient ways that statisticians quickly summarize and understand the relationship among variables. The aim here isn’t to be efficient – but to be clear. In that spirit, I give you the histogram below.
- All 4877 One-Time-Olivers voted exactly one time.
You should lol. It makes sense though, right? And, the segment name should make a lot more sense.
The histogram below summarizes how, on average, One-Time-Olivers voted – positive or negative. Since they only voted one time, it’s either an upvote, or a downvote. A +1 or -1 average.
- One-Time-Oliver’s tend to upvote once, and are never heard from again.
- In answering the question – “Who’s downvoting you on Reddit”, it isn’t One-Time-Olivers.
Vanity accounts frequently enter Reddit, they flicker, and they go out. They get discouraged. They never really commit to the bit. That’s what happens to them. The histogram below takes on that familiar long-tail curve.
- There are lot of Vanity-Vanessa’s, some 7,527 of them.
- Most of them posted only 2, 3, or 4 times.
So, how did they vote?
The histogram below summarizes the story:
- Vanity-Vanessa’s upvoted nearly everything they saw, with very few exceptions.
- Very few persistently downvoted everything they saw.
- They’re not the ones downvoting you on Reddit.
Recall that the average username votes 326 times, and yet, I still labeled Average-Andy, ranging between 9 and 48 votes, as average andy.
That’s because the mean number of votes that Average-Andy’s cast is 22.25 – which is close to the median of 20 for the entire set.
This mixing and abstraction of median, mean, and segmentation isn’t something that I expect most people to consider or think about, but I can foresee some getting hung up on it. When you think about an equal segmentation though, it makes sense that the mean of your middle category should be close to the median of the entire set.
For everybody else – just know that you’re you’re looking at the “average joe redditor” here.
- Average number of votes is 22.25, close to the median of 20 for the whole set.
- Familiar long tail.
- A majority of Average Andy’s liked everything they saw – they upovoted everything.
- They downvote more often than Vanity-Vanessa’s or One-Time-Oliver’s, but not massively.
- They aren’t downvoting in such a huge way to say that these are the ones downvoting you on reddit.
By now you’re pretty much a pro at reading these histograms. Frequent Fred’s vote frequently. Look at the histogram below.
- Classic long-tail continues.
- Averaging 139.3 votes.
- The unusual bump at the beginning of the series is just magnified by the scale from the previous vote frequency histogram. (It’s fine).
- Far fewer of them are likely to upvote absolutely everything they see.
- There’s significant flattening of the long tail – the average is .74.
- More of them, on average, are disposed to downvoting.
- The long tail is holding – there’s significant clustering at 1000 and 2000.
- The cause is related to rate limiting within the Reddit API.
- The longest part of the long tail – those power users with thousands and thousands of votes, are all bundled and clustered together at 2000.
- There are around 500 of such power users, representing some 1.5% of the total usernames.
- The bump at 0 is caused by 1000 upvotes getting averaged out by 1000 upvotes.
- 0’s aside, which are tugging on the mean, Pauline’s are on average more prone to downvoting.
- Power Paulines are downvoting you on Reddit.
Tomorrow we’re going to put a bow on it and bring it all together.
I’m Christopher Berry.
I tweet about analytics @cjpberry
I write at christopherberry.ca