<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ChristopherBerry.ca &#187; Analytics</title>
	<atom:link href="http://christopherberry.ca/category/analytics/feed/" rel="self" type="application/rss+xml" />
	<link>http://christopherberry.ca</link>
	<description></description>
	<lastBuildDate>Mon, 21 May 2012 20:10:32 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Testing Three Themes</title>
		<link>http://christopherberry.ca/2012/04/theme-testing/</link>
		<comments>http://christopherberry.ca/2012/04/theme-testing/#comments</comments>
		<pubDate>Sun, 15 Apr 2012 17:49:19 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Social Analytics]]></category>
		<category><![CDATA[Social Media Analytics]]></category>
		<category><![CDATA[Social Media Measurement]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=880</guid>
		<description><![CDATA[Post frequency on the analytics focused blog, Eyes on Analytics has increased to daily. In part, this is to solidify the understanding of the frequency-reach curve in blogging, and in part, it&#8217;s an attempt to understand where the broader market is at. I&#8217;m testing three themes: How to fight nature&#8217;s pesky way of inhibiting our [...]]]></description>
			<content:encoded><![CDATA[<p>Post frequency on the analytics focused blog, <a title="Eyes on Analytics" href="http://christopher-berry.blogspot.ca/" target="_blank">Eyes on Analytics</a> has increased to daily. In part, this is to solidify the understanding of the frequency-reach curve in blogging, and in part, it&#8217;s an attempt to understand where the broader market is at.</p>
<p><strong>I&#8217;m testing three themes:</strong></p>
<ul>
<li>How to fight nature&#8217;s pesky way of inhibiting our ability to make clean causal statements.</li>
</ul>
<ul>
<li>The importance of imagination in identifying independent variables.</li>
</ul>
<ul>
<li>The role of evidence in decision making.</li>
</ul>
<p>Simplification of a message is not pandering. However, many pandering statements are deliberate simplifications.</p>
<p><strong>If your optimization objective is to gain followers:</strong></p>
<ul>
<li>Post often.</li>
</ul>
<ul>
<li>Post simply.</li>
</ul>
<ul>
<li>Post what people want to hear.</li>
</ul>
<p>I&#8217;m choosing simplification while avoiding pandering.</p>
<p>Let&#8217;s see how that unfolds over the next 60 days.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2012/04/theme-testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why don&#8217;t the campaign components add up?</title>
		<link>http://christopherberry.ca/2012/03/why-dont-the-campaign-components-add-up/</link>
		<comments>http://christopherberry.ca/2012/03/why-dont-the-campaign-components-add-up/#comments</comments>
		<pubDate>Sat, 17 Mar 2012 16:04:18 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Social Analytics]]></category>
		<category><![CDATA[Social Media Analytics]]></category>
		<category><![CDATA[Social Media Measurement]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=878</guid>
		<description><![CDATA[Sometimes the components of a marketing channel will not add up to equal the total performance of the marketing channel. This is caused by any number of realities and limitations imposed in part by nature, and, in part, by you, the marketer. Consider the following deliberately simple scenario: March 2012 Impressions: Total Digital Impressions Delivered: [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes the components of a marketing channel will not add up to equal the total performance of the marketing channel. This is caused by any number of realities and limitations imposed in part by nature, and, in part, by you, the marketer.</p>
<p>Consider the following <strong>deliberately simple</strong> scenario:</p>
<p>March 2012 Impressions:</p>
<ul>
<li><strong>Total Digital Impressions Delivered:</strong> 100,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions with Chicken Creative:</strong> 25,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions with Beef Creative:</strong> 50,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions with Pork Creative:</strong> 75,000,000</li>
</ul>
<p>Something doesn&#8217;t make sense. I&#8217;m telling you that 100,000,000 impressions were delivered in total, but each component of that figure: 25 million, 50 million, and 75 million, don&#8217;t actually add up.</p>
<p>That&#8217;s because creative can have multiple attributes. An ad may feature Chicken alone, Beef alone, or Pork alone. An ad may feature Beef with Pork. An ad may feature Chicken with Beef. An ad may feature Chicken with Pork. In a crazy twist, perhaps some creative features all three! (The madness!). Attributes can cause such complexity when it&#8217;s possible for a single thing to have multiple attributes.</p>
<p>The next scenario demonstrates complications that arise because of instrumentation:</p>
<p>March 2012 Impressions:</p>
<ul>
<li><strong>Total Digital Impressions Delivered:</strong> 100,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions served to Males:</strong> 60,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions served to Females:</strong> 10,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions likely served to 35 to 50 year olds:</strong> 1,000,000</li>
</ul>
<p>All people have attributes, but not all people have attributes that can be measured.</p>
<p>It might very well be that for the XBOX Live component, Microsoft can report with greater certainty, owing to profile information, that the content was served to more males. And, because that particular app was geared towards males, there&#8217;s greater certainty on that end. It also might be the case that another component was on mommy blogger ad networks, however, the knowledge of the ad targeter was really ethical, and wasn&#8217;t uniquely tracking everybody, so, the &#8216;missing 40 million impressions&#8217; aren&#8217;t missing.</p>
<p>The same goes for the age component. We may hypothesize because of Quantcast data that those impressions served on mommy blog networks were heavily 35 to 50 year old females, but, there&#8217;s nothing in the instrumentation itself that confirms that hypothesis.</p>
<p>Just because it may be measurable doesn&#8217;t guarantee that it will be measured.</p>
<p><strong>Finally, consider the complexity imposed by time:</strong></p>
<p>March 2012 Impressions:</p>
<ul>
<li><strong>Total Digital Impressions Delivered:</strong> 100,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions from Affiliate Program:</strong> 10,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions from the RayRayHayHay campaign:</strong> 8,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions from the A campaign:</strong> 1,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions from the Eh campaign:</strong> 1,000,000</li>
</ul>
<p>Well, CLEARLY the A campaign and the Eh campaign failed &#8211; since the affiliates didn&#8217;t use those creative treatments much at all. What we don&#8217;t know is time.</p>
<ul>
<li><strong>Date the RayRayHayHay campaign creative was posted:</strong> January 5, 2012</li>
</ul>
<ul>
<li><strong><strong>Date the A campaign creative was posted:</strong></strong> March 1, 2012</li>
</ul>
<ul>
<li><strong><strong>Date the Eh campaign creative was posted:</strong></strong> March 28, 2012</li>
</ul>
<p>That&#8217;s 1 million impressions served in 3 days for the Eh campaign. That&#8217;s 1 million impressions served in 31 days for the A campaign.</p>
<p>Such component analysis is made particularly tricky when we&#8217;re trying to do it using a monthly report or some other arbitrary unit of time.</p>
<p><strong>In sum:<br />
</strong></p>
<p>Channel performance analysis is not channel component analysis. These are two distinct types of analytics, aimed at answering two different classes questions. For the reasons listed above, attribute overlap, instrumentation limitations, and time, the sum of the components may not add up to the total. This is not a devastating realization if you understand the differences and how to think of them.</p>
<p>There&#8217;s a general optimistic sense that drillability, the ability to drill into any metric and see its components, is possible in all contexts. It is possible in some contexts. It is not possible in all contexts. Privacy and technical disruption impose long run constraints in ever being able to achieve that.</p>
<p>It&#8217;s not likely to be perfect any time soon, and, in some cases, the components won&#8217;t ever add up.</p>
<p>***</p>
<p>(Note to fellow analysts: I chose impressions to keep it really simple. On-site and post-click analysis is required. Statistical analysis exists for a reason, so, even armed with impression and CTR data, you may analyze performance across multiple attributes. Moreover,you ought to be aware of the biases that exist in your data set &#8211; is it the case that males really did respond better, or, is it the case that the instrumentation is just better at identifying males?)</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2012/03/why-dont-the-campaign-components-add-up/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Who&#8217;s Downvoting You On Reddit?</title>
		<link>http://christopherberry.ca/2012/02/whos-downvoting-you-on-reddit/</link>
		<comments>http://christopherberry.ca/2012/02/whos-downvoting-you-on-reddit/#comments</comments>
		<pubDate>Sun, 12 Feb 2012 14:09:44 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Marketing Science]]></category>
		<category><![CDATA[Social Analytics]]></category>
		<category><![CDATA[Social Media Analytics]]></category>
		<category><![CDATA[Social Media Measurement]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=850</guid>
		<description><![CDATA[So who keeps on downvoting you on Reddit? We&#8217;ll find out. But first &#8211; three notes: You may be familiar with Reddit. If you&#8217;re not &#8211; you can read this explanation about what Reddit is. To answer that question, I downloaded a dataset that was built in early 2011 or very late 2010. The dataset [...]]]></description>
			<content:encoded><![CDATA[<p>So who keeps on downvoting you on Reddit? We&#8217;ll find out.</p>
<p>But first &#8211; three notes:</p>
<ul>
<li>You may be familiar with <a href="http://www.reddit.com/" target="_blank">Reddit</a>. If you&#8217;re not &#8211; you can read this explanation about <a href="http://christopherberry.ca/whats-reddit/" target="_blank">what Reddit is</a>.</li>
</ul>
<ul>
<li>To answer that question, I downloaded a dataset that was built in early 2011 or very late 2010. The dataset is a 29MB gzip compressed and contains 7,405,561 votes from 31,927 users over 2,046,401 links. You can read about the <a href="http://christopherberry.ca/methodology-for-whos-downvoting-you-on-reddit/" target="_blank">methodology here</a>.</li>
<li>The file contains three columns &#8211; a vote, a userid, and a link. Only people who had their privacy settings set to open had that data read by an API. There is no meta-data about who these people are in real life (IRL) or even what was the nature of the content they were upvoting and downvoting.</li>
</ul>
<p><strong>So, who&#8217;s downvoting you on reddit?</strong></p>
<p>To find out, I took that huge file transformed it into another one &#8211; boiling it down into a single user name, how many times that username vote (numberofvotes), and the average of all their votes.</p>
<p>You can see below that _mike voted 26 times, and, if you take the average of all his votes, +1 for an upvote and -1 for a downvote, it turns out to be -.92. Basically, _mike didn&#8217;t like a lot of what he saw. In fact, _mike upvoted once (+1) and downvoted 25 times (-25). So (- 25) + (+1) is -24, and -24/26 is -.92.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/reddit-users.png"><img class="aligncenter  wp-image-829" title="reddit-users" src="http://christopherberry.ca/wp-content/uploads/2012/02/reddit-users-300x64.png" alt="" width="390" height="83" /></a></p>
<p>There are over 30,000 usernames here &#8211; and that&#8217;s a lot of data. It&#8217;s really important to visualize the data before you really get into any analysis. One way to do that is to run a histogram.</p>
<p><strong>To read the histogram below, remember:</strong></p>
<ul>
<li>Frequency means &#8216;the number of usernames that fall into this category or range&#8217;.</li>
</ul>
<ul>
<li>Numberofvotes means &#8216;the number of times a username voted.&#8217;</li>
</ul>
<ul>
<li>Mean is another word for average.</li>
</ul>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/votes-by-people.png"><img class="aligncenter size-full wp-image-834" title="votes-by-people" src="http://christopherberry.ca/wp-content/uploads/2012/02/votes-by-people.png" alt="" width="589" height="493" /></a><strong></strong></p>
<p><strong>There are three takeaways from the histogram above:</strong></p>
<ul>
<li>The average number of votes by a username was 234.</li>
</ul>
<ul>
<li>A large number of usernames didn&#8217;t vote very many times at all.</li>
</ul>
<ul>
<li>There are bumps at 1000 and 2000 votes. (If you&#8217;re interested as to why &#8211; see the <a href="http://christopherberry.ca/methodology-for-whos-downvoting-you-on-reddit/" target="_blank">Methodological</a> notes. Incidentally &#8211; this is why you should always visualize your data.)</li>
</ul>
<p>A histogram is built from a Frequency Table, which we&#8217;ll see below.</p>
<p><strong>The way to read a frequency table is:</strong></p>
<ul>
<li>The &#8216;Valid&#8217; column means &#8216;how many times a username voted&#8217;.</li>
</ul>
<ul>
<li>Frequency means &#8216;the number of usernames that falls into this category&#8217;.</li>
</ul>
<ul>
<li>Percent means &#8216;the percentage of all the usernames that those in this category represents&#8217;.</li>
</ul>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/numberofvotes-reddit.png"><img title="numberofvotes-reddit" src="http://christopherberry.ca/wp-content/uploads/2012/02/numberofvotes-reddit.png" alt="" width="455" height="509" /></a></p>
<p><strong>There are three takeaways from the Frequency Table above:</strong></p>
<ul>
<li>4877 of the usernames only voted one time (It&#8217;s likely they submitted a single link and never returned).</li>
</ul>
<ul>
<li>Note how both the percentages and number of usernames in each category decrease.</li>
</ul>
<ul>
<li>50.1% of all the usernames voted 20 times or less. (Look at the cumulative percent column and make sure that makes sense to you. We&#8217;re going to use this column later.)</li>
</ul>
<p>You may have heard the term &#8216;long tail&#8217; many times before. This is a demonstration of what that means. The bars on the histogram falls away to right.</p>
<p>Recall that the average of all the votes a username made is called &#8216;averagevote&#8217;. If somebody was persistently downvoting links, they&#8217;d have a negative number. If they upvoted everything they saw, they&#8217;d have an averagevote of +1.</p>
<p>Read the histogram below.  <a href="http://christopherberry.ca/wp-content/uploads/2012/02/average-vote-by-person.png"><img class="aligncenter size-full wp-image-831" title="average-vote-by-person" src="http://christopherberry.ca/wp-content/uploads/2012/02/average-vote-by-person.png" alt="" width="580" height="492" /></a><strong>The three takeaways are:</strong></p>
<ul>
<li>Negativity follows a reverse long tail. (It really happens &#8211; see how the figures fall away to left)</li>
<li>On average, usernames upvoted what they saw (average 0.79).</li>
<li>There are bumps at 0 (related to a methodological note) and at -1.</li>
</ul>
<p>By now, two of my good friends in London are screaming at the screen. Means are a horrible way to explain long tail distributions. You can see that now too. Means are giving us a pretty skewed view of the world.</p>
<p>The table below is a byproduct of our Frequency table. It&#8217;s aptly labeled &#8216;Statistics&#8217;, and compares these two variables, numberofvotes, and averagevote, side by side. I&#8217;ve thrown a yellow box around &#8216;percentiles&#8217;. Recall the cumulative column from previous frequency table.</p>
<ul>
<li>22.8% of all usernames voted 2 times or less.</li>
<li>40.8% voted 9 times or less.</li>
</ul>
<p>The program I&#8217;m using is giving me &#8216;break points&#8217; for those percentiles.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/reddit-percentiles.png"><img class="aligncenter size-full wp-image-830" title="reddit-percentiles" src="http://christopherberry.ca/wp-content/uploads/2012/02/reddit-percentiles.png" alt="" width="639" height="477" /></a></p>
<p><strong>Two takeaways:</strong></p>
<ul>
<li>The median gives a better summary of what&#8217;s going on here &#8211; half of the usernames voted 20 times or less, and, another set of usernames always upvoted what they saw.</li>
<li>If I know that roughly 80% of all usernames posted 325 times or less, then I know that 20% of the usernames in my sample posted 325 times or more.</li>
</ul>
<p>We&#8217;re going to use those percentile cutoff points to inform a segmentation, next.</p>
<p><strong>Segmentation</strong></p>
<p>A segmentation is a grouping of records, usually people, into categories. There is not prescription for how to do this. If you talk to a modeller, they&#8217;ll tell you about their clustering algorithms. If you talk to a machine learning scientist, they&#8217;ll tell you about bump-hunting or unsupervised machine learning clustering. Those are all very good algorithms. I use them myself.</p>
<p>I&#8217;m going for simplicity here. I have these four percentile cut-off points that evenly cut people into five categories. And, for further simplicity, instead of referring to a group of people who posted between 9 and 48 times as &#8216;those who posted between 9 and 48 times&#8217;, I&#8217;m going to call them Average-Andy&#8217;s. And I&#8217;ll just keep on calling them that.</p>
<p>At this point, I don&#8217;t know if they&#8217;re male or female. (And we won&#8217;t in this thread). And it&#8217;s controversial to use alliteration. But it&#8217;s done.</p>
<p>So, mapping the percentiles against a segmentation, based on how many times a username voted, we have:</p>
<ul>
<li>1 time: One-Time-Oliver</li>
</ul>
<ul>
<li>2 to 9 times: Vanity-Vanessa</li>
</ul>
<ul>
<li>9 to 48 times: Average-Andy</li>
</ul>
<ul>
<li>48 to 325 times: Frequent-Fred</li>
</ul>
<ul>
<li>More than 325 times: Power-Pauline</li>
</ul>
<p>Take a look at the result below &#8211; a variable I&#8217;m calling &#8216;equalseg&#8217; &#8211; short for &#8216;equal segmentation&#8217;.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/equalsegs.png"><img class="aligncenter size-full wp-image-835" title="equalsegs" src="http://christopherberry.ca/wp-content/uploads/2012/02/equalsegs.png" alt="" width="528" height="190" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>There are 4877 One-Time-Olivers, representing 15.5% of the usernames in the sample.</li>
</ul>
<ul>
<li>Vanity-Vanessa&#8217;s represent 23.9% of the usernames.</li>
</ul>
<ul>
<li>The last three segments are pretty equally divided &#8211; the first two are more lopsided.</li>
</ul>
<p>Even though I aimed to have five groups of people with equal numbers in each, you can see the division between One-Time-Olivers and Vanity-Vanessa&#8217;s are off. This happens very often when segmenting a long tail into equal groups. And, while not ideal, it&#8217;s okay for our purposes.</p>
<p>Next, we&#8217;re going to examine each segment individually.</p>
<p><strong>One-Time-Olivers</strong></p>
<p>There are very efficient ways that statisticians quickly summarize and understand the relationship among variables. The aim here isn&#8217;t to be efficient &#8211; but to be clear. In that spirit, I give you the histogram below.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/one-time-oliver-votes.png"><img class="aligncenter size-full wp-image-842" title="one-time-oliver-votes" src="http://christopherberry.ca/wp-content/uploads/2012/02/one-time-oliver-votes.png" alt="" width="553" height="485" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>All 4877 One-Time-Olivers voted exactly one time.</li>
</ul>
<p>You should lol. It makes sense though, right? And, the segment name should make a lot more sense.</p>
<p>The histogram below summarizes how, on average, One-Time-Olivers voted &#8211; positive or negative. Since they only voted one time, it&#8217;s either an upvote, or a downvote. A +1 or -1 average.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/one-time-oliver-averagevote.png"><img class="aligncenter size-full wp-image-841" title="one-time-oliver-averagevote" src="http://christopherberry.ca/wp-content/uploads/2012/02/one-time-oliver-averagevote.png" alt="" width="567" height="489" /></a></p>
<p><strong> Takeaways:</strong></p>
<ul>
<li>One-Time-Oliver&#8217;s tend to upvote once, and are never heard from again.</li>
<li>In answering the question &#8211; &#8220;Who&#8217;s downvoting you on Reddit&#8221;, it isn&#8217;t One-Time-Olivers.</li>
</ul>
<p>&nbsp;</p>
<p><strong> Vanity Vanessa</strong></p>
<p>Vanity accounts frequently enter Reddit, they flicker, and they go out. They get discouraged. They never really commit to the bit. That&#8217;s what happens to them. The histogram below takes on that familiar long-tail curve.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/vanity-vanessa-vote.png"><img class="aligncenter size-full wp-image-844" title="vanity-vanessa-vote" src="http://christopherberry.ca/wp-content/uploads/2012/02/vanity-vanessa-vote.png" alt="" width="571" height="483" /></a><strong>Takeaways:</strong></p>
<ul>
<li>There are lot of Vanity-Vanessa&#8217;s, some 7,527 of them.</li>
</ul>
<ul>
<li>Most of them posted only 2, 3, or 4 times.</li>
</ul>
<p>So, how did they vote?</p>
<p>The histogram below summarizes the story:</p>
<p>&nbsp;</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/vanity-vanessa-averagevote.png"><img class="aligncenter  wp-image-843" title="vanity-vanessa-averagevote" src="http://christopherberry.ca/wp-content/uploads/2012/02/vanity-vanessa-averagevote.png" alt="" width="565" height="491" /></a><strong>Takeaways:</strong></p>
<ul>
<li>Vanity-Vanessa&#8217;s upvoted nearly everything they saw, with very few exceptions.</li>
</ul>
<ul>
<li>Very few persistently downvoted everything they saw.</li>
</ul>
<ul>
<li>They&#8217;re not the ones downvoting you on Reddit.</li>
</ul>
<p>&nbsp;</p>
<p>Average-Andy&#8217;s</p>
<p>Recall that the average username votes 326 times, and yet, I still labeled Average-Andy, ranging between 9 and 48 votes, as average andy. That&#8217;s because the mean number of votes that Average-Andy&#8217;s cast is 22.25 &#8211; which is close to the median of 20 for the entire set.</p>
<p>This mixing and abstraction of median, mean, and segmentation isn&#8217;t something that I expect most people to consider or think about, but I can foresee some getting hung up on it. When you think about an equal segmentation though, it makes sense that the mean of your middle category should be close to the median of the entire set.</p>
<p>For everybody else &#8211; just know that you&#8217;re you&#8217;re looking at the &#8220;average joe redditor&#8221; here.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/average-andy-votes.png"><img class="aligncenter size-full wp-image-840" title="average-andy-votes" src="http://christopherberry.ca/wp-content/uploads/2012/02/average-andy-votes.png" alt="" width="614" height="492" /></a><strong>Takeaways:</strong></p>
<ul>
<li>Average number of votes is 22.25, close to the median of 20 for the whole set.</li>
<li>Familiar long tail.</li>
</ul>
<p>How do they vote?</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/average-andy-averagevote.png"><img class="aligncenter size-full wp-image-845" title="average-andy-averagevote" src="http://christopherberry.ca/wp-content/uploads/2012/02/average-andy-averagevote.png" alt="" width="560" height="485" /></a><strong>Takeaways:</strong></p>
<ul>
<li>A majority of Average Andy&#8217;s liked everything they saw &#8211; they upovoted everything.</li>
</ul>
<ul>
<li>They downvote more often than Vanity-Vanessa&#8217;s or One-Time-Oliver&#8217;s, but not massively.</li>
</ul>
<ul>
<li>They aren&#8217;t downvoting in such a huge way to say that these are the ones downvoting you on reddit.</li>
</ul>
<p>&nbsp;</p>
<p><strong>Frequent Fred</strong></p>
<p>By now you&#8217;re pretty much a pro at reading these histograms. Frequent Fred&#8217;s vote frequently. Look at the histogram below.</p>
<p>&nbsp;</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/frequent-fred-votes.png"><img class="aligncenter size-full wp-image-847" title="frequent-fred-votes" src="http://christopherberry.ca/wp-content/uploads/2012/02/frequent-fred-votes.png" alt="" width="571" height="493" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>Classic long-tail continues.</li>
</ul>
<ul>
<li>Averaging 139.3 votes.</li>
</ul>
<ul>
<li>The unusual bump at the beginning of the series is just magnified by the scale from the previous vote frequency histogram. (It&#8217;s fine).</li>
</ul>
<p>How do they vote?</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/frequent-fred-averagevote.png"><img class="aligncenter size-full wp-image-846" title="frequent-fred-averagevote" src="http://christopherberry.ca/wp-content/uploads/2012/02/frequent-fred-averagevote.png" alt="" width="564" height="480" /></a><strong>Takeaways:</strong></p>
<ul>
<li>Far fewer of them are likely to upvote absolutely everything they see.</li>
</ul>
<ul>
<li>There&#8217;s significant flattening of the long tail &#8211; the average is .74.</li>
</ul>
<ul>
<li>More of them, on average, are disposed to downvoting.</li>
</ul>
<p><strong>Power Paulines</strong></p>
<p>Power Paulines are the most difficult group to analyze, but the easiest to summarize and understand. Take a look at the histogram below.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/power-pauline-votes.png"><img class="aligncenter size-full wp-image-849" title="power-pauline-votes" src="http://christopherberry.ca/wp-content/uploads/2012/02/power-pauline-votes.png" alt="" width="585" height="489" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>The long tail is holding &#8211; there&#8217;s significant clustering at 1000 and 2000.</li>
</ul>
<ul>
<li>The cause is related to rate limiting within the Reddit API.</li>
</ul>
<ul>
<li>The longest part of the long tail &#8211; those power users with thousands and thousands of votes, are all bundled and clustered together at 2000.</li>
</ul>
<ul>
<li>There are around 500 of such power users, representing some 1.5% of the total usernames.</li>
</ul>
<p>So how do they vote?</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/power-pauline-voteaverage.png"><img class="aligncenter size-full wp-image-848" title="power-pauline-voteaverage" src="http://christopherberry.ca/wp-content/uploads/2012/02/power-pauline-voteaverage.png" alt="" width="566" height="488" /></a></p>
<p><strong> Takeaways:</strong></p>
<ul>
<li>The bump at 0 is caused by 1000 upvotes getting averaged out by 1000 upvotes.</li>
</ul>
<ul>
<li>0&#8242;s aside, which are tugging on the mean, Pauline&#8217;s are on average more prone to downvoting.</li>
</ul>
<ul>
<li>Power Paulines are downvoting you on Reddit.</li>
</ul>
<p>&nbsp;</p>
<p><strong>Putting a bow on it</strong></p>
<p>The chart below summarizes the relationship between segment and their average vote. You can see a clear negative direction. The more one uses Reddit, the more one downvotes &#8211; even if the mean is exaggerated in the Power Pauline segment.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/reddit-summary.png"><img class="aligncenter size-full wp-image-838" title="reddit-summary" src="http://christopherberry.ca/wp-content/uploads/2012/02/reddit-summary.png" alt="" width="544" height="355" /></a></p>
<p>To really hammer the point home about the origin of downovotes, take a look a the table below. It&#8217;s broken out by the segments you understand. It also contains two new variables &#8211; upvotes and downvotes. That is the total count of the number of upvotes and downvotes made by each segment.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/TotalSUM.png"><img class="aligncenter size-full wp-image-854" title="TotalSUM" src="http://christopherberry.ca/wp-content/uploads/2012/02/TotalSUM.png" alt="" width="656" height="875" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>One-Time Olivers as a group were responsible for 175 of all the downvotes cast.</li>
</ul>
<ul>
<li>Vanity-Vanessa&#8217;s as a group were responsible for 1781 of all the downvotes cast.</li>
</ul>
<ul>
<li>Average-Andy&#8217;s as a group were responsible for 13,258 of all the downvotes cast.</li>
</ul>
<ul>
<li>Frequent-Fred as a group were responsible for 120,758 of all the downvotes cast.</li>
</ul>
<ul>
<li>Power-Paulines as a group were responsible for 1,672,368 of all the OBSERVED downvotes cast &#8211; but are probably responsible for a lot more in aggregate across all of Reddit. (This sample contains a bias, but bias doesn&#8217;t mean I can&#8217;t say anything at all about anything.)</li>
</ul>
<p>Note the differences in order of magnitude between each group. 1781 is roughly 10 times greater than 175. And so, a bit imperfectly on the way up to Frequent-Fred&#8217;s. There&#8217;s an order of magnitude difference here in terms of the amount of weight each group casts.</p>
<p><strong>The greatest power users users of Reddit are the ones who are downvoting you &#8211; and it&#8217;s an exponential power.</strong></p>
<p>&nbsp;</p>
<p><strong>But wait, there&#8217;s more.</strong></p>
<p>Recall, however, that there over 7 million votes cast. 1.8 million were downvotes, and 5.5 million were upvotes. Read the statistics table below to verify that.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/redditvotes.png"><img class="aligncenter size-full wp-image-837" title="redditvotes" src="http://christopherberry.ca/wp-content/uploads/2012/02/redditvotes.png" alt="" width="455" height="383" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>Upvotes outnumber downvotes.</li>
</ul>
<ul>
<li>The interface of Reddit itself causes upvotes to accumulate.</li>
</ul>
<ul>
<li>Reddit itself is a cause of a bias &#8211; probably by design.</li>
</ul>
<p>The histogram below is by links &#8211; the content getting upvoted or downvoted. There were just over 2 million links submitted. On average, each link received 3.62 upvotes. Given everything you know about long tails, think about just how deceptive that 3.62 mean figure is. Note how you can&#8217;t even see the bumps in the tail. And be in awe of the efficiency of the collective Reddit behavior that causes popular content to disproportionately promoted while even &#8216;good&#8217; or &#8216;average&#8217; content gets relentlessly shifted to the left &#8211; all by a very small group of people.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/votes-per-link.png"><img class="aligncenter size-full wp-image-839" title="votes-per-link" src="http://christopherberry.ca/wp-content/uploads/2012/02/votes-per-link.png" alt="" width="598" height="495" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>The long tail is long and powerful.</li>
</ul>
<ul>
<li>This small group Power-Paulines are far more likely to downvote because of a much higher frequency of use.</li>
</ul>
<p>I&#8217;m thanking Reddit for making so many API&#8217;s publicly exposed and enabling this sort of analysis and exploration. Thank you.</p>
<p>&nbsp;</p>
<p>Portions of this post appeared on <a href="http://christopher-berry.blogspot.com/" target="_blank">Eyes On Analytics</a> the week of February 5, 2012.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2012/02/whos-downvoting-you-on-reddit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Commentary on the proposed telescreens</title>
		<link>http://christopherberry.ca/2012/01/commentary-on-the-proposed-telescreens/</link>
		<comments>http://christopherberry.ca/2012/01/commentary-on-the-proposed-telescreens/#comments</comments>
		<pubDate>Sun, 15 Jan 2012 20:37:28 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Strategic Analytics]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=806</guid>
		<description><![CDATA[You may have read something about the Samsung 7500 and 8000 series televisions, the ones with a camera installed in them, over the past few days. The tl;dr summary: &#8220;For Samsung&#8217;s 7500 and 8000 series TVs, all you have to do is say &#8220;Hi, TV,&#8221; when you walk into a room for the TV to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://adage.com/article/special-report-ces/tv-watch/232094/" target="_blank">You may have read something</a> about the Samsung 7500 and 8000 series televisions, the ones with a camera installed in them, over the past few days.</p>
<p><strong>The tl;dr summary:</strong></p>
<p><em>&#8220;For Samsung&#8217;s 7500 and 8000 series TVs, all you have to do is say &#8220;Hi, TV,&#8221; when you walk into a room for the TV to turn on and know who&#8217;s there.&#8221;</em></p>
<p><em>&#8220;Think of it: The tech means an advertiser or TV programmer could, for the first time, know which members of a Nielsen household are watching a show or an ad. Cisco has even developed a system meant to read facial expressions and determine whether you&#8217;re entertained or bored.&#8221;</em></p>
<p><em>&#8220;Many people in the living room are multitasking with other devices. &#8220;We&#8217;re paying for that,&#8221; said Rex Harris, innovations supervisor at SMGX, a unit of ad agency holding company Publicis Groupe. &#8220;If you&#8217;re looking at other screens, then you&#8217;re not paying attention. We would like to know if we&#8217;re getting accurate impressions.&#8221;"</em></p>
<p><strong>Commentary:</strong></p>
<p>Alright &#8211; so &#8211; a simple innovation, the webcam, is jumping from the PC/DVR into a TV, and we get a few folks who come out and speculate what it could mean. It all ends up sounding like a 1984 telescreen idea, which, I&#8217;m 99% certain, is not what Samsung has/had in mind.</p>
<p><strong>Broadcast isn&#8217;t digital.</strong></p>
<p>Repeat: broadcast. isn&#8217;t. digital.</p>
<p><strong>This has implications:</strong></p>
<ul>
<li>There is enough inventory for targeted ads and offers in digital because the technology enables the creation of multiple ad treatments at scale. No such technology exists in the broadcast industry.</li>
</ul>
<ul>
<li>People already effectively segment themselves by TV show preference.</li>
</ul>
<ul>
<li>On Demand technologies like Netflix, and time shifting technologies like streaming and DVR&#8217;s, are already eroding the concentration of key market segments.</li>
</ul>
<ul>
<li>Plot the S-curve adoption rate of the technologies driving market fragmentation against the adoption of new, Big-Brother enabled telescreens, and see which wins. (Hint: it&#8217;s time shifting and on-demand).</li>
</ul>
<ul>
<li>You&#8217;re paying for junk impressions because we&#8217;re developing ad blindness, just like we&#8217;ve developed banner blindness.</li>
</ul>
<p>No amount of surveillance is going to change that fact.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2012/01/commentary-on-the-proposed-telescreens/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Web Analytics Wednesday &#8211; October 26 &#8211; Wellington</title>
		<link>http://christopherberry.ca/2011/10/web-analytics-wednesday-october-26-wellington/</link>
		<comments>http://christopherberry.ca/2011/10/web-analytics-wednesday-october-26-wellington/#comments</comments>
		<pubDate>Wed, 26 Oct 2011 14:30:29 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Analytics Strategy]]></category>
		<category><![CDATA[Complexity Analytics]]></category>
		<category><![CDATA[Complexity Economics]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Mobile Analytics]]></category>
		<category><![CDATA[Social Analytics]]></category>
		<category><![CDATA[Social Media Analytics]]></category>
		<category><![CDATA[Social Media Measurement]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=743</guid>
		<description><![CDATA[Web Analytics Wednesday is tonight at The Wellington, in downtown Toronto&#8217;s analytics alley. It&#8217;s generously supported by AT Internet. There are some 40 people &#8211; representing among the best of the best, who will be in attendance. It&#8217;s a great opportunity for web analysts, social analysts, marketing scientists, data scientists, hackers, developers, and usability professionals [...]]]></description>
			<content:encoded><![CDATA[<p>Web Analytics Wednesday is tonight at <a href="http://www.barwellington.ca/">The Wellington</a>, in downtown Toronto&#8217;s analytics alley. It&#8217;s generously supported by <a href="http://en.atinternet.com/">AT Internet</a>. There are some 40 people &#8211; representing among the best of the best, who will be in attendance. It&#8217;s a great opportunity for web analysts, social analysts, marketing scientists, data scientists, hackers, developers, and usability professionals to come out and talk about the great ideas and opportunities we have going on in Toronto.</p>
<p>It&#8217;s also the first get together after eMetrics New York, which was a major, and had big time Canadian attendance. These tend to be among the more interesting evenings. It has also been some three months since the last WAWTO event, so there should be quite a few fresh stories.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2011/10/web-analytics-wednesday-october-26-wellington/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>eMetrics New York 2011</title>
		<link>http://christopherberry.ca/2011/10/emetrics-new-york-2011/</link>
		<comments>http://christopherberry.ca/2011/10/emetrics-new-york-2011/#comments</comments>
		<pubDate>Sun, 16 Oct 2011 17:22:29 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=735</guid>
		<description><![CDATA[I&#8217;ll be at eMetrics next week. I hope you will be too. It&#8217;ll be great to be back in New York. There are a few people that I&#8217;m looking forward to seeing: John Lovett on social media, Melinda Driscoll on web analytics, Shari Cleary on media, Joseph Stanhope on mobile, Alex Langshur on government. And [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be at <a href="http://www.emetrics.org/newyork/" target="_blank">eMetrics</a> next week. I hope you will be too.</p>
<p>It&#8217;ll be great to be back in New York.</p>
<p>There are a few people that I&#8217;m looking forward to seeing: John Lovett on social media, Melinda Driscoll on web analytics, Shari Cleary on media, Joseph Stanhope on mobile, Alex Langshur on government. And then there&#8217;s Michael Healy, Patrick Glinski, and me.</p>
<p>I&#8217;m presenting with Michael Healy on sentiment. Michael Healy is among the best thinkers in this space and is just great. There have been a few very recent breakthroughs in sentiment analysis over the summer (and as recently as last week), and I&#8217;m looking forward to explaining how to treat the measure. I understand a core problem with the application of the metric &#8211; the gap between what some want the metric to mean &#8211; and what the metric actually really measures.</p>
<p>I&#8217;m with <a href="http://www.ideacouture.com/who-we-are/patrick-glinski" target="_blank">Patrick Glinski of Idea Couture</a> on Friday &#8211; presenting &#8220;Communicating Data to Designers&#8221;. It&#8217;s a really different topic, and something you won&#8217;t see anywhere else. It&#8217;s not on the radar yet as a differentiating competitive advantage. It&#8217;s new, it&#8217;s different, it&#8217;s fresh &#8211; and even a bit risky. So come on out. We won&#8217;t bite.</p>
<p>I&#8217;m looking forward to seeing you out.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2011/10/emetrics-new-york-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Our Mobile Planet &#8211; Select statistics for International Smartphone Penetration</title>
		<link>http://christopherberry.ca/2011/10/our-mobile-planet-select-statistics-for-international-smartphone-penetration/</link>
		<comments>http://christopherberry.ca/2011/10/our-mobile-planet-select-statistics-for-international-smartphone-penetration/#comments</comments>
		<pubDate>Mon, 10 Oct 2011 17:46:49 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Marketing Science]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=728</guid>
		<description><![CDATA[Have you seen this site, put out by Google for their &#8220;Our Mobile Planet&#8221; study? It&#8217;s an excellent way to present data in a very accessible, very explorable way. I found it inspiring. The call to action is &#8220;create your chart now&#8221;. A very good, honest, call to action. The technology adoption S-curve can be [...]]]></description>
			<content:encoded><![CDATA[<p>Have you seen <a href="http://www.ourmobileplanet.com/" target="_blank">this site</a>, put out by Google for their &#8220;Our Mobile Planet&#8221; study?</p>
<p>It&#8217;s an excellent way to present data in a very accessible, very explorable way. I found it inspiring.</p>
<p>The call to action is &#8220;create your chart now&#8221;. A very good, honest, call to action.</p>
<p>The technology adoption S-curve can be a slow beast, and expectations of growth have persistently outstripped actual adoption, at least in North America, and especially in Canada. Adoption has a few drags on it in North America and Europe. No such drags exist in Asia.</p>
<p>The chart below compares all the countries smartphone penetration. (Click to embiggen)</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2011/10/totalpenetration.png"><img class="aligncenter size-medium wp-image-730" title="totalpenetration" src="http://christopherberry.ca/wp-content/uploads/2011/10/totalpenetration-300x235.png" alt="" width="300" height="235" /></a></p>
<p>That chart masks underlining maturity in each country. The chart below compares m-commerce &#8216;at least one time&#8217; usage across Germany, China, and the United States. The big three economies. (Click to embiggen).</p>
<p>&nbsp;</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2011/10/purchasedOnASmartPhone.png"><img class="aligncenter size-medium wp-image-729" title="purchasedOnASmartPhone" src="http://christopherberry.ca/wp-content/uploads/2011/10/purchasedOnASmartPhone-300x112.png" alt="" width="300" height="112" /></a></p>
<p>That&#8217;s a good snapshot. M-commerce is thought of as a pretty big risk in North America.</p>
<p>In many ways, the melding of couponing with check-ins was the right bridge into mobile for the times. As we all watch Groupon careen into the inevitable mess, we all ask &#8216;what&#8217;s next&#8217;. I ask, &#8216;where&#8217;s the utility&#8217;, &#8216;how can mobile be used to salvage previously wasted parts of my day?&#8217;</p>
<p>I look to m-commerce as being predictable. Certain firms, like <a href="http://www.plasticmobile.com/" target="_blank">Plastic Mobile</a>, have extensive experience with eCommerce, and understand mobile. They&#8217;re not going to replicate the pain of the nineties. There&#8217;s an inevitability to it.</p>
<p>Though, painfully punctuated.</p>
<p>This would have happened sooner if it wasn&#8217;t for the double whammy of policy plus recession.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2011/10/our-mobile-planet-select-statistics-for-international-smartphone-penetration/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Moneyball and Analytics</title>
		<link>http://christopherberry.ca/2011/10/moneyball-and-analytics/</link>
		<comments>http://christopherberry.ca/2011/10/moneyball-and-analytics/#comments</comments>
		<pubDate>Sun, 02 Oct 2011 17:42:45 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=722</guid>
		<description><![CDATA[The plot of Moneyball is fairly well known among analytics folks. It&#8217;s a relatable example of how to  compete on analytics. Many statisticians love baseball. It&#8217;s a natural extension. And it&#8217;s been written to death about in the pop-analytics literature. It&#8217;s good stuff. It&#8217;s a nice case study. John Lovett predicts that Moneyball will put [...]]]></description>
			<content:encoded><![CDATA[<p>The plot of Moneyball is fairly well known among analytics folks. It&#8217;s a relatable example of how to  compete on analytics. Many statisticians love baseball. It&#8217;s a natural extension. And it&#8217;s been written to death about in the pop-analytics literature. It&#8217;s good stuff. It&#8217;s a nice case study.</p>
<p><a title="Moneyball Analytics On The Map" href="http://john.webanalyticsdemystified.com/2011/09/21/moneyball-will-put-web-analytics-on-the-map/">John Lovett predicts that Moneyball will put analytics on the map</a>.</p>
<p>It&#8217;s likely. It&#8217;s just so damn relatable.</p>
<p>Ideas have a long journey from conception to popularization. Nash had been known to game theorists and a sub-set of political scientists who don&#8217;t understand people, since the beginning. Most didn&#8217;t learn of it until &#8216;A Beautiful Mind&#8217; came out. Moneyball is that movie.</p>
<p><strong>To extend the lesson from Moneyball -</strong></p>
<p>1. Everybody has the same mental model of the way the world works with respect to some phenomenon. (Experience E used to perform Task T with performance P as the outcome).</p>
<p>2. A dominant model dictates a dominant set of Key Performance Indicators.</p>
<p>3. Somebody discovers a better model and validates it through analytics.</p>
<p>4. Somebody effectively competes better as a result of the new model.</p>
<p>5. Others resist the new model (lock-in), they persist in the market until they die or are defeated.</p>
<p>6. New model becomes the new dominant one.</p>
<p>7. Return to (1) and repeat.</p>
<p>&nbsp;</p>
<p>***</p>
<p>I&#8217;m Christopher Berry.</p>
<p>I bridge the gap between <a href="http://www.webanalyticsassociation.org/members/blog_view.asp?id=538344" target="_blank">marketing science and data science</a>.</p>
<p>I welcome connections on <a href="http://twitter.com/cjpberry" target="_blank">Twitter</a> and <a href="http://www.linkedin.com/profile/view?id=26002267" target="_blank">LinkedIn</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2011/10/moneyball-and-analytics/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics in Toronto</title>
		<link>http://christopherberry.ca/2011/09/analytics-in-toronto/</link>
		<comments>http://christopherberry.ca/2011/09/analytics-in-toronto/#comments</comments>
		<pubDate>Sun, 11 Sep 2011 20:10:20 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=714</guid>
		<description><![CDATA[Analytics is alive and growing in Toronto. This post summarizes what I know I know. If you define analytics as being &#8216;the scientific method applied to data to generate sustainable advantage&#8217;, then there are three major concentrations of practitioners: finance, marketing and operations. The financial sector breaks out into the risk management and the speculation [...]]]></description>
			<content:encoded><![CDATA[<p>Analytics is alive and growing in Toronto. This post summarizes what I know I know.</p>
<p>If you define analytics as being &#8216;the scientific method applied to data to generate sustainable advantage&#8217;, then there are three major concentrations of practitioners: finance, marketing and operations.</p>
<p>The financial sector breaks out into the risk management and the speculation fields. There&#8217;s a higher self-referential graph amongst the risk management people in insurance than there are on the banking side. The speculation analytics folks are at severe disadvantage against their New York counterparts. If there&#8217;s a thriving hedge fund section in our community, I don&#8217;t know about it.</p>
<p>Marketing is divided between startups, CRM vendors, agencies, and client side (which includes data mining and web analytics). There are very few deep pockets of specialization &#8211; but they do exist in social analytics, mobile analytics, and web analytics. Extreme subject matter experts in Omniture, Coremetrics, and Webtrends number fewer than 10. Subject matter experts in Google Analytics exceed 30, however, that&#8217;s not to be confused with &#8216;people who have google analytics on their blog&#8217;.</p>
<p>There&#8217;s a thriving operations section of analytics &#8211; supply chain/logistics, transportation logistics, and internal operations. I also lump the impressively large gaming analytics (online gaming and gambling), and the minuscule dating optimization folk into this space.</p>
<p>There&#8217;s variety and specialization in analytics just within Toronto. And it&#8217;s pretty awesome.</p>
<p>The best way to break into analytics is to practice analytics. People who practice analytics in a context specific to something they enjoy are more likely to enjoy success and attract attention.</p>
<p>The best way to find somebody who knows analytics is to attend an industry event. They are frequent and welcoming.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2011/09/analytics-in-toronto/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics At Scale</title>
		<link>http://christopherberry.ca/2011/08/analytics-at-scale/</link>
		<comments>http://christopherberry.ca/2011/08/analytics-at-scale/#comments</comments>
		<pubDate>Sun, 21 Aug 2011 15:27:41 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Analytics Strategy]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=710</guid>
		<description><![CDATA[Two trends, an exponential increase in data produced, and a linear increase in the number of analysts produced per quarter, continue pose a massive challenge to businesses and analytics practices alike. We need both physical technology and social technology to practice analytics at scale. &#160; There are three grouping of physical technologies: First, there&#8217;s instrumentation [...]]]></description>
			<content:encoded><![CDATA[<p>Two trends, an exponential increase in data produced, and a linear increase in the number of analysts produced per quarter, continue pose a massive challenge to businesses and analytics practices alike.</p>
<p>We need both physical technology and social technology to practice analytics at scale.</p>
<p>&nbsp;</p>
<p>There are three grouping of physical technologies:</p>
<ul>
<li>First, there&#8217;s instrumentation technology that we use to measure  and record the world around us.</li>
</ul>
<ul>
<li>Second, there&#8217;s analysis technology that we use to understand the data that&#8217;s coming.</li>
</ul>
<ul>
<li>Third, there&#8217;s presentation technology that we use to communicate a world view, and what to do next.</li>
</ul>
<p>On the instrumentation technology side, we&#8217;ve all had a few challenges with instrumentation as of late. Specifically, <a href="http://analytics.blogspot.com/2011/08/update-to-sessions-in-google-analytics.html">the understanding of definitions, their impacts, and the unexpected impact of bugs</a>. Empathy from one technologist to another on this front. Instrumentation is not easy.</p>
<p>On the analysis side, SPSS, R, SAS, Datameer, Python. Amazing technologies, some of which may be used as controllers, some of which are used by analysts to peer into the deepest, most chaotic systems.</p>
<p>On the presentation technology side, we have excel, powerpoint, keynote, and certain dashboarding technologies. They have pros and cons. XML or JSON API&#8217;s ought to be the future, or some version of it, here. It doesn&#8217;t seem like a big problem. But it a fairly wicked one, because credibility and authority are bound up in aesthetic.</p>
<p>Getting these three physical technologies right, linked up together, is very important to practice analytics at scale.</p>
<p>&nbsp;</p>
<p><strong>Social technology at scale</strong></p>
<p>People are incredibly important because they&#8217;re the ones who <a href="http://christopherberry.ca/science/communication/the-definition-of-insight/">generate insight</a>, and ultimately cause beneficial change that results in sustainable competitive advantage. It&#8217;s not the physical technology of software and hardware. The institutions that cause them to behave in very specific ways is a social technology. And it must be in place to scale.</p>
<p>There are a number of problems that an organization creates for analytics professionals.</p>
<p>For one, most organizations don&#8217;t know what they don&#8217;t know about analytics. They don&#8217;t understand that instrumentation is still young and buggy. That truth isn&#8217;t absolute. That sleuthing is part of the role. That it&#8217;s not just &#8220;pizza and spreadsheets&#8221;. That it takes time to put together a series of recommendations that make sense in a given a system or context. That not every convenient reasoning business case can be generated, or generated quickly. That not everything is recorded by the instrumentation. The first three months setting up any new analytical institution is entirely about resetting expectations.</p>
<p>There are a number of problems that analytical leadership causes for their organizations.</p>
<p>For one, most organizations don&#8217;t know what they don&#8217;t know about analytics. Bad behaviors result. They hive off the data. They clam up indiscriminately. They refuse to engage with regions of the company for extended periods of time without a strategy in place for such cut-offs. They hire the wrong people. They don&#8217;t secure headcount for enough people to be successful. They&#8217;re unable to demonstrate their own ROI. They don&#8217;t say no often enough to be able to cause their own ROI. They don&#8217;t champion their own work. They acquire a siege mentality. They don&#8217;t publish and they don&#8217;t share their successes with industry. They churn rapidly.</p>
<p>Not all leadership is bad and not all organizations are poisonous to analytics.</p>
<p>If we accept the premise that organizations want sustainable competitive advantage from analytics, and that analytics leadership wants that same outcome, we can construct a physical technology stack and a social technology stack that achieves that end.</p>
<p>&nbsp;</p>
<p><strong>Current thinking on that end:</strong></p>
<p>1. Mediums and Medias are fragmenting. The most progressive thinking on the subject is towards medium planning (Syncapse, Teehan+Lax), and as a result, analytics leadership that resists new, novel, mediums are likely to be viewed as obstructionist. Instrumentation will fragment as a result. This is okay. Derive a medium measurement strategy. It&#8217;s what our collective leadership must become good at.</p>
<p>2. Analytics means having an analytical tool to use. SPSS is preferable because of usability. R is preferable in certain environments because it&#8217;s free. Firms that actively compete on analytics may require a big data stack to data mine very large sets.</p>
<p>3. Recommendations are communicated in powerpoint. The business schools have decreed this. Collaborate on problems without a powerpoint presentation. New thinking from the business schools have decreed this.</p>
<p>4. Every single organization has a C-level that always asks for go-pher analyses. The rest of the organization typically pays no cost to support an analytical seat. These ad-hoc seats are an excellent way to hire new talent out of the universities and are good training opportunities, under supervision. Allow your experienced guns to find the insights, and let your inters-juniors do the gophering. Record the output of the gophering.</p>
<p>5. Nobody in the organization has incentive to acknowledge the analytics departments for insight discovery. The leadership of that department must make sure that there&#8217;s a solid internal culture that rewards insight.</p>
<p>&nbsp;</p>
<p>Most organizations desire analytics at scale &#8211; which means handling both the intelligence and ad hoc sectors of the business &#8211; simultaneously. The way to get there is by combining social and physical technologies that enable that scale. It will be an ongoing losing war, but every battle should bring victories.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2011/08/analytics-at-scale/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

