This is the second in a series of five posts about Reddit and Analytics. The complete thread will be posted at the end of the week. Recall that the average of all the votes a username made is called ‘averagevote’. If somebody was persistently downvoting links, they’d have a negative number. If they upvoted everything they saw, they’d have an averagevote of +1. Read the histogram below.   The three takeaways are: Negativity follows a reverse long tail. (It really happens – see how the figures fall away to left) On average, usernames upvoted what they saw (average 0.79). There are bumps at 0 (related to a methodological note) and at -1. By now, two of my good friends in London[…]

This is the first in a series of five posts about Reddit and Analytics. The complete thread will be posted at the end of the week. So who keeps on downvoting you on Reddit? We’ll find out. But first – three notes: You may be familiar with Reddit. If you’re not – you can read this explanation about what Reddit is. To answer that question, I downloaded a dataset that was built in early 2011 or very late 2010. The dataset is a 29MB gzip compressed and contains 7,405,561 votes from 31,927 users over 2,046,401 links. You can read about the methodology here. The file contains three columns – a vote, a userid, and a link. Only people who had[…]

Kurt wrote an excellent post about building a data science team. It’s excellent and it’s worth reading. To expand off his points: The first 90 days provide fuel for the subsequent 180. The 180 days after are far muddier, because what was scaling in very unsophisticated interfaces require a lot more work to become elegant solutions. Data scientists should evangelize evidence and do what they can to develop interfaces that democratize the data. The math is a means to the end. Own reflections: I’m extremely thankful for my years of experience with Information Architects and Designers – as now – when I go into a room and they’re not around, I actively think about that end state. I’m glad I’ve[…]

A fellow data scientist and I were debating how to answer a very specific question that is asked all the time by others. How would we answer it? I grabbed a piece of paper and drew a histogram. A histogram: Plots a single variable along the X-axis. Plots the occurrence, or frequency of a given variable along the Y-axis. Is used by statisticians and analysts to understand the frequency distribution of a given variable. I said: “This is how I would want to see the data. This is how I answer the question today. This is what I would want to compare,” Then paused. Reflected. And added, “I am not the end user.” The end user isn’t a statistician, marketing[…]

“Don’t Make Me Think” by Steve Krug is one of my favourite books. I strongly recommend it to web analysts and data scientist. In that spirit – here are a few of my favourite interfaces: pinterest.com rdio.com imgur.com Commonalities: Real choices about what to put in and leave out were made – in other words – they are designed. They were not assembled. Not every surface is crammed with stuff. Just because nature abhors a vacuum doesn’t mean you need to cram something into every pixel. It’s obvious what everything does. Simple can be functional. What are your nominations? *** I’m Christopher Berry.I tweet about analytics @cjpberryI write at christopherberry.ca

What’s the Return On Investment on Marketing? Depends on how soon you want your return. Time is frequently a neglected variable. Recall that marketing had a schism right around 1920: One man went on to found the branding agency, and found salvation through broadcast radio, and later, TV.  One man founded the first direct advertising agency, and continued to find salvation through direct response and cataloging.  The schism only really came to a head when digital forced it to come to a head. Implications: Evidence for a direct causal inference between marketing treatment and marketing conversion is greatest at the point of sale / point of conversion. Any evidence of causality is severely diluted at the branding / awareness level[…]

You may have read a lot about Foxconn last week. Tl;DR summary: Foxconn is the subcontractor that makes the iPhone and iPad. Foxconn’s CEO called his workers animals. Foxconn, is probably racking up big time ILO violations. Here’s the key tl;dr quote, from a current Apple executive: “We don’t have an obligation to solve America’s problems. Our only obligation is making the best product possible.” That’s a lot of focus. That’s laser-like precision on a given mission. Because it follows that if Apple produces the best product possible, people won’t care about anything else. Indeed, isn’t s/he right? Doesn’t free market and price competition makes hypocrites of us all? (Two-Buck-Chuck anybody?) What if it didn’t have to? Implications for Analytics[…]

You may be familiar with smart grids, open data, and Pachube. I was reading a piece on smart data, when suddenly a wild quote appears: “…a  group of hackers who demonstrated in early 2012 that it is possible to discern exactly what film someone is watching by analysing the power consumption of their TV via their smart meter, as every film has a unique  ‘fingerprint’ of electricity usage.” Oh yes. Confirmed. It happened at a hacking for privacy event. Reactions and Questions: What an unintended consequence of the technology! What other hidden signals might there be in other sources of data? What good might come of re-purposing seemingly noisy/garbage data? Just as William H. Perkin discovered purple dye in waste[…]

Thomas L. Friedman wrote a fairly good piece for the New York Times. The theme is linked to something that has kept public policy makers awake for a very long time – the Productivity Trilemma. These two themes explain part of the reason for the rise of Data Science and how Web Analytics must evolve. To summarize Friedman: The era of average people relying on doing an average job for average pay is over. Technology is more efficient than ever at destroying average jobs. Everybody has to get smarter. To summarize the Productivity Trilemma: Productivity growth causes growth in GDP, producing negative employment effects. Real interest rates outpace real growth rate of GDP, causing regressive redistribution effects, leading to the[…]

According to Chris Dodd, the response against SOPA was unlike anything he’s seen in his thirty years in politics. He called it a ‘watershed event’. Possibly. Proponents of SOPA argue: Americans are losing their jobs to foreign pirates. National security!!! It’s Caucasian looting – all you kids just want free crap. Opponents of SOPA argue: Externalities. Proponents want to believe that somehow Google made me oppose SOPA. What should be of even more concern to Chris Dodd was that Google had very little incremental effect. Their contribution to the movement was weak compared to what the real grassroots did. There was no astroturf: I learned of SOPA from one of the image boards. It led to a slow moving reddit[…]