This is the fifth in a series of five posts about Reddit and Analytics.

Previously – we covered the nature of the dataset, read histograms, generated segments, and understood that the most frequent users of Reddit are the ones who are doing the most downvoting by an astounding margin.

But wait, there’s more.

Recall, however, that there over 7 million votes cast. 1.8 million were downvotes, and 5.5 million were upvotes. Read the statistics table below to verify that.


  • Upvotes outnumber downvotes.
  • The interface of Reddit itself causes upvotes to accumulate.
  • Reddit itself is a cause of a bias – probably by design.

The histogram below is by links – the content getting upvoted or downvoted. There were just over 2 million links submitted. On average, each link received 3.62 upvotes. Given everything you know about long tails, think about just how deceptive that 3.62 mean figure is. Note how you can’t even see the bumps in the tail. And be in awe of the efficiency of the collective Reddit behavior that causes popular content to disproportionately promoted while even ‘good’ or ‘average’ content gets relentlessly shifted to the left – all by a very small group of people.


  • The long tail is long and powerful.
  • This small group Power-Paulines are far more likely to downvote because of a much higher frequency of use.

I’m thanking Reddit for making so many API’s publicly exposed and enabling this sort of analysis and exploration. Thank you.


