<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>ChristopherBerry.ca</title>
	<atom:link href="http://christopherberry.ca/feed/" rel="self" type="application/rss+xml" />
	<link>http://christopherberry.ca</link>
	<description></description>
	<lastBuildDate>Sun, 15 Apr 2012 17:49:19 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Testing Three Themes</title>
		<link>http://christopherberry.ca/2012/04/theme-testing/</link>
		<comments>http://christopherberry.ca/2012/04/theme-testing/#comments</comments>
		<pubDate>Sun, 15 Apr 2012 17:49:19 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Social Analytics]]></category>
		<category><![CDATA[Social Media Analytics]]></category>
		<category><![CDATA[Social Media Measurement]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=880</guid>
		<description><![CDATA[Post frequency on the analytics focused blog, Eyes on Analytics has increased to daily. In part, this is to solidify the understanding of the frequency-reach curve in blogging, and in part, it&#8217;s an attempt to understand where the broader market is at. I&#8217;m testing three themes: How to fight nature&#8217;s pesky way of inhibiting our [...]]]></description>
			<content:encoded><![CDATA[<p>Post frequency on the analytics focused blog, <a title="Eyes on Analytics" href="http://christopher-berry.blogspot.ca/" target="_blank">Eyes on Analytics</a> has increased to daily. In part, this is to solidify the understanding of the frequency-reach curve in blogging, and in part, it&#8217;s an attempt to understand where the broader market is at.</p>
<p><strong>I&#8217;m testing three themes:</strong></p>
<ul>
<li>How to fight nature&#8217;s pesky way of inhibiting our ability to make clean causal statements.</li>
</ul>
<ul>
<li>The importance of imagination in identifying independent variables.</li>
</ul>
<ul>
<li>The role of evidence in decision making.</li>
</ul>
<p>Simplification of a message is not pandering. However, many pandering statements are deliberate simplifications.</p>
<p><strong>If your optimization objective is to gain followers:</strong></p>
<ul>
<li>Post often.</li>
</ul>
<ul>
<li>Post simply.</li>
</ul>
<ul>
<li>Post what people want to hear.</li>
</ul>
<p>I&#8217;m choosing simplification while avoiding pandering.</p>
<p>Let&#8217;s see how that unfolds over the next 60 days.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2012/04/theme-testing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why don&#8217;t the campaign components add up?</title>
		<link>http://christopherberry.ca/2012/03/why-dont-the-campaign-components-add-up/</link>
		<comments>http://christopherberry.ca/2012/03/why-dont-the-campaign-components-add-up/#comments</comments>
		<pubDate>Sat, 17 Mar 2012 16:04:18 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Social Analytics]]></category>
		<category><![CDATA[Social Media Analytics]]></category>
		<category><![CDATA[Social Media Measurement]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=878</guid>
		<description><![CDATA[Sometimes the components of a marketing channel will not add up to equal the total performance of the marketing channel. This is caused by any number of realities and limitations imposed in part by nature, and, in part, by you, the marketer. Consider the following deliberately simple scenario: March 2012 Impressions: Total Digital Impressions Delivered: [...]]]></description>
			<content:encoded><![CDATA[<p>Sometimes the components of a marketing channel will not add up to equal the total performance of the marketing channel. This is caused by any number of realities and limitations imposed in part by nature, and, in part, by you, the marketer.</p>
<p>Consider the following <strong>deliberately simple</strong> scenario:</p>
<p>March 2012 Impressions:</p>
<ul>
<li><strong>Total Digital Impressions Delivered:</strong> 100,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions with Chicken Creative:</strong> 25,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions with Beef Creative:</strong> 50,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions with Pork Creative:</strong> 75,000,000</li>
</ul>
<p>Something doesn&#8217;t make sense. I&#8217;m telling you that 100,000,000 impressions were delivered in total, but each component of that figure: 25 million, 50 million, and 75 million, don&#8217;t actually add up.</p>
<p>That&#8217;s because creative can have multiple attributes. An ad may feature Chicken alone, Beef alone, or Pork alone. An ad may feature Beef with Pork. An ad may feature Chicken with Beef. An ad may feature Chicken with Pork. In a crazy twist, perhaps some creative features all three! (The madness!). Attributes can cause such complexity when it&#8217;s possible for a single thing to have multiple attributes.</p>
<p>The next scenario demonstrates complications that arise because of instrumentation:</p>
<p>March 2012 Impressions:</p>
<ul>
<li><strong>Total Digital Impressions Delivered:</strong> 100,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions served to Males:</strong> 60,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions served to Females:</strong> 10,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions likely served to 35 to 50 year olds:</strong> 1,000,000</li>
</ul>
<p>All people have attributes, but not all people have attributes that can be measured.</p>
<p>It might very well be that for the XBOX Live component, Microsoft can report with greater certainty, owing to profile information, that the content was served to more males. And, because that particular app was geared towards males, there&#8217;s greater certainty on that end. It also might be the case that another component was on mommy blogger ad networks, however, the knowledge of the ad targeter was really ethical, and wasn&#8217;t uniquely tracking everybody, so, the &#8216;missing 40 million impressions&#8217; aren&#8217;t missing.</p>
<p>The same goes for the age component. We may hypothesize because of Quantcast data that those impressions served on mommy blog networks were heavily 35 to 50 year old females, but, there&#8217;s nothing in the instrumentation itself that confirms that hypothesis.</p>
<p>Just because it may be measurable doesn&#8217;t guarantee that it will be measured.</p>
<p><strong>Finally, consider the complexity imposed by time:</strong></p>
<p>March 2012 Impressions:</p>
<ul>
<li><strong>Total Digital Impressions Delivered:</strong> 100,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions from Affiliate Program:</strong> 10,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions from the RayRayHayHay campaign:</strong> 8,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions from the A campaign:</strong> 1,000,000</li>
</ul>
<ul>
<li><strong>Total Impressions from the Eh campaign:</strong> 1,000,000</li>
</ul>
<p>Well, CLEARLY the A campaign and the Eh campaign failed &#8211; since the affiliates didn&#8217;t use those creative treatments much at all. What we don&#8217;t know is time.</p>
<ul>
<li><strong>Date the RayRayHayHay campaign creative was posted:</strong> January 5, 2012</li>
</ul>
<ul>
<li><strong><strong>Date the A campaign creative was posted:</strong></strong> March 1, 2012</li>
</ul>
<ul>
<li><strong><strong>Date the Eh campaign creative was posted:</strong></strong> March 28, 2012</li>
</ul>
<p>That&#8217;s 1 million impressions served in 3 days for the Eh campaign. That&#8217;s 1 million impressions served in 31 days for the A campaign.</p>
<p>Such component analysis is made particularly tricky when we&#8217;re trying to do it using a monthly report or some other arbitrary unit of time.</p>
<p><strong>In sum:<br />
</strong></p>
<p>Channel performance analysis is not channel component analysis. These are two distinct types of analytics, aimed at answering two different classes questions. For the reasons listed above, attribute overlap, instrumentation limitations, and time, the sum of the components may not add up to the total. This is not a devastating realization if you understand the differences and how to think of them.</p>
<p>There&#8217;s a general optimistic sense that drillability, the ability to drill into any metric and see its components, is possible in all contexts. It is possible in some contexts. It is not possible in all contexts. Privacy and technical disruption impose long run constraints in ever being able to achieve that.</p>
<p>It&#8217;s not likely to be perfect any time soon, and, in some cases, the components won&#8217;t ever add up.</p>
<p>***</p>
<p>(Note to fellow analysts: I chose impressions to keep it really simple. On-site and post-click analysis is required. Statistical analysis exists for a reason, so, even armed with impression and CTR data, you may analyze performance across multiple attributes. Moreover,you ought to be aware of the biases that exist in your data set &#8211; is it the case that males really did respond better, or, is it the case that the instrumentation is just better at identifying males?)</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2012/03/why-dont-the-campaign-components-add-up/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Who&#8217;s Downvoting You On Reddit?</title>
		<link>http://christopherberry.ca/2012/02/whos-downvoting-you-on-reddit/</link>
		<comments>http://christopherberry.ca/2012/02/whos-downvoting-you-on-reddit/#comments</comments>
		<pubDate>Sun, 12 Feb 2012 14:09:44 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Marketing Science]]></category>
		<category><![CDATA[Social Analytics]]></category>
		<category><![CDATA[Social Media Analytics]]></category>
		<category><![CDATA[Social Media Measurement]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=850</guid>
		<description><![CDATA[So who keeps on downvoting you on Reddit? We&#8217;ll find out. But first &#8211; three notes: You may be familiar with Reddit. If you&#8217;re not &#8211; you can read this explanation about what Reddit is. To answer that question, I downloaded a dataset that was built in early 2011 or very late 2010. The dataset [...]]]></description>
			<content:encoded><![CDATA[<p>So who keeps on downvoting you on Reddit? We&#8217;ll find out.</p>
<p>But first &#8211; three notes:</p>
<ul>
<li>You may be familiar with <a href="http://www.reddit.com/" target="_blank">Reddit</a>. If you&#8217;re not &#8211; you can read this explanation about <a href="http://christopherberry.ca/whats-reddit/" target="_blank">what Reddit is</a>.</li>
</ul>
<ul>
<li>To answer that question, I downloaded a dataset that was built in early 2011 or very late 2010. The dataset is a 29MB gzip compressed and contains 7,405,561 votes from 31,927 users over 2,046,401 links. You can read about the <a href="http://christopherberry.ca/methodology-for-whos-downvoting-you-on-reddit/" target="_blank">methodology here</a>.</li>
<li>The file contains three columns &#8211; a vote, a userid, and a link. Only people who had their privacy settings set to open had that data read by an API. There is no meta-data about who these people are in real life (IRL) or even what was the nature of the content they were upvoting and downvoting.</li>
</ul>
<p><strong>So, who&#8217;s downvoting you on reddit?</strong></p>
<p>To find out, I took that huge file transformed it into another one &#8211; boiling it down into a single user name, how many times that username vote (numberofvotes), and the average of all their votes.</p>
<p>You can see below that _mike voted 26 times, and, if you take the average of all his votes, +1 for an upvote and -1 for a downvote, it turns out to be -.92. Basically, _mike didn&#8217;t like a lot of what he saw. In fact, _mike upvoted once (+1) and downvoted 25 times (-25). So (- 25) + (+1) is -24, and -24/26 is -.92.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/reddit-users.png"><img class="aligncenter  wp-image-829" title="reddit-users" src="http://christopherberry.ca/wp-content/uploads/2012/02/reddit-users-300x64.png" alt="" width="390" height="83" /></a></p>
<p>There are over 30,000 usernames here &#8211; and that&#8217;s a lot of data. It&#8217;s really important to visualize the data before you really get into any analysis. One way to do that is to run a histogram.</p>
<p><strong>To read the histogram below, remember:</strong></p>
<ul>
<li>Frequency means &#8216;the number of usernames that fall into this category or range&#8217;.</li>
</ul>
<ul>
<li>Numberofvotes means &#8216;the number of times a username voted.&#8217;</li>
</ul>
<ul>
<li>Mean is another word for average.</li>
</ul>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/votes-by-people.png"><img class="aligncenter size-full wp-image-834" title="votes-by-people" src="http://christopherberry.ca/wp-content/uploads/2012/02/votes-by-people.png" alt="" width="589" height="493" /></a><strong></strong></p>
<p><strong>There are three takeaways from the histogram above:</strong></p>
<ul>
<li>The average number of votes by a username was 234.</li>
</ul>
<ul>
<li>A large number of usernames didn&#8217;t vote very many times at all.</li>
</ul>
<ul>
<li>There are bumps at 1000 and 2000 votes. (If you&#8217;re interested as to why &#8211; see the <a href="http://christopherberry.ca/methodology-for-whos-downvoting-you-on-reddit/" target="_blank">Methodological</a> notes. Incidentally &#8211; this is why you should always visualize your data.)</li>
</ul>
<p>A histogram is built from a Frequency Table, which we&#8217;ll see below.</p>
<p><strong>The way to read a frequency table is:</strong></p>
<ul>
<li>The &#8216;Valid&#8217; column means &#8216;how many times a username voted&#8217;.</li>
</ul>
<ul>
<li>Frequency means &#8216;the number of usernames that falls into this category&#8217;.</li>
</ul>
<ul>
<li>Percent means &#8216;the percentage of all the usernames that those in this category represents&#8217;.</li>
</ul>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/numberofvotes-reddit.png"><img title="numberofvotes-reddit" src="http://christopherberry.ca/wp-content/uploads/2012/02/numberofvotes-reddit.png" alt="" width="455" height="509" /></a></p>
<p><strong>There are three takeaways from the Frequency Table above:</strong></p>
<ul>
<li>4877 of the usernames only voted one time (It&#8217;s likely they submitted a single link and never returned).</li>
</ul>
<ul>
<li>Note how both the percentages and number of usernames in each category decrease.</li>
</ul>
<ul>
<li>50.1% of all the usernames voted 20 times or less. (Look at the cumulative percent column and make sure that makes sense to you. We&#8217;re going to use this column later.)</li>
</ul>
<p>You may have heard the term &#8216;long tail&#8217; many times before. This is a demonstration of what that means. The bars on the histogram falls away to right.</p>
<p>Recall that the average of all the votes a username made is called &#8216;averagevote&#8217;. If somebody was persistently downvoting links, they&#8217;d have a negative number. If they upvoted everything they saw, they&#8217;d have an averagevote of +1.</p>
<p>Read the histogram below.  <a href="http://christopherberry.ca/wp-content/uploads/2012/02/average-vote-by-person.png"><img class="aligncenter size-full wp-image-831" title="average-vote-by-person" src="http://christopherberry.ca/wp-content/uploads/2012/02/average-vote-by-person.png" alt="" width="580" height="492" /></a><strong>The three takeaways are:</strong></p>
<ul>
<li>Negativity follows a reverse long tail. (It really happens &#8211; see how the figures fall away to left)</li>
<li>On average, usernames upvoted what they saw (average 0.79).</li>
<li>There are bumps at 0 (related to a methodological note) and at -1.</li>
</ul>
<p>By now, two of my good friends in London are screaming at the screen. Means are a horrible way to explain long tail distributions. You can see that now too. Means are giving us a pretty skewed view of the world.</p>
<p>The table below is a byproduct of our Frequency table. It&#8217;s aptly labeled &#8216;Statistics&#8217;, and compares these two variables, numberofvotes, and averagevote, side by side. I&#8217;ve thrown a yellow box around &#8216;percentiles&#8217;. Recall the cumulative column from previous frequency table.</p>
<ul>
<li>22.8% of all usernames voted 2 times or less.</li>
<li>40.8% voted 9 times or less.</li>
</ul>
<p>The program I&#8217;m using is giving me &#8216;break points&#8217; for those percentiles.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/reddit-percentiles.png"><img class="aligncenter size-full wp-image-830" title="reddit-percentiles" src="http://christopherberry.ca/wp-content/uploads/2012/02/reddit-percentiles.png" alt="" width="639" height="477" /></a></p>
<p><strong>Two takeaways:</strong></p>
<ul>
<li>The median gives a better summary of what&#8217;s going on here &#8211; half of the usernames voted 20 times or less, and, another set of usernames always upvoted what they saw.</li>
<li>If I know that roughly 80% of all usernames posted 325 times or less, then I know that 20% of the usernames in my sample posted 325 times or more.</li>
</ul>
<p>We&#8217;re going to use those percentile cutoff points to inform a segmentation, next.</p>
<p><strong>Segmentation</strong></p>
<p>A segmentation is a grouping of records, usually people, into categories. There is not prescription for how to do this. If you talk to a modeller, they&#8217;ll tell you about their clustering algorithms. If you talk to a machine learning scientist, they&#8217;ll tell you about bump-hunting or unsupervised machine learning clustering. Those are all very good algorithms. I use them myself.</p>
<p>I&#8217;m going for simplicity here. I have these four percentile cut-off points that evenly cut people into five categories. And, for further simplicity, instead of referring to a group of people who posted between 9 and 48 times as &#8216;those who posted between 9 and 48 times&#8217;, I&#8217;m going to call them Average-Andy&#8217;s. And I&#8217;ll just keep on calling them that.</p>
<p>At this point, I don&#8217;t know if they&#8217;re male or female. (And we won&#8217;t in this thread). And it&#8217;s controversial to use alliteration. But it&#8217;s done.</p>
<p>So, mapping the percentiles against a segmentation, based on how many times a username voted, we have:</p>
<ul>
<li>1 time: One-Time-Oliver</li>
</ul>
<ul>
<li>2 to 9 times: Vanity-Vanessa</li>
</ul>
<ul>
<li>9 to 48 times: Average-Andy</li>
</ul>
<ul>
<li>48 to 325 times: Frequent-Fred</li>
</ul>
<ul>
<li>More than 325 times: Power-Pauline</li>
</ul>
<p>Take a look at the result below &#8211; a variable I&#8217;m calling &#8216;equalseg&#8217; &#8211; short for &#8216;equal segmentation&#8217;.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/equalsegs.png"><img class="aligncenter size-full wp-image-835" title="equalsegs" src="http://christopherberry.ca/wp-content/uploads/2012/02/equalsegs.png" alt="" width="528" height="190" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>There are 4877 One-Time-Olivers, representing 15.5% of the usernames in the sample.</li>
</ul>
<ul>
<li>Vanity-Vanessa&#8217;s represent 23.9% of the usernames.</li>
</ul>
<ul>
<li>The last three segments are pretty equally divided &#8211; the first two are more lopsided.</li>
</ul>
<p>Even though I aimed to have five groups of people with equal numbers in each, you can see the division between One-Time-Olivers and Vanity-Vanessa&#8217;s are off. This happens very often when segmenting a long tail into equal groups. And, while not ideal, it&#8217;s okay for our purposes.</p>
<p>Next, we&#8217;re going to examine each segment individually.</p>
<p><strong>One-Time-Olivers</strong></p>
<p>There are very efficient ways that statisticians quickly summarize and understand the relationship among variables. The aim here isn&#8217;t to be efficient &#8211; but to be clear. In that spirit, I give you the histogram below.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/one-time-oliver-votes.png"><img class="aligncenter size-full wp-image-842" title="one-time-oliver-votes" src="http://christopherberry.ca/wp-content/uploads/2012/02/one-time-oliver-votes.png" alt="" width="553" height="485" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>All 4877 One-Time-Olivers voted exactly one time.</li>
</ul>
<p>You should lol. It makes sense though, right? And, the segment name should make a lot more sense.</p>
<p>The histogram below summarizes how, on average, One-Time-Olivers voted &#8211; positive or negative. Since they only voted one time, it&#8217;s either an upvote, or a downvote. A +1 or -1 average.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/one-time-oliver-averagevote.png"><img class="aligncenter size-full wp-image-841" title="one-time-oliver-averagevote" src="http://christopherberry.ca/wp-content/uploads/2012/02/one-time-oliver-averagevote.png" alt="" width="567" height="489" /></a></p>
<p><strong> Takeaways:</strong></p>
<ul>
<li>One-Time-Oliver&#8217;s tend to upvote once, and are never heard from again.</li>
<li>In answering the question &#8211; &#8220;Who&#8217;s downvoting you on Reddit&#8221;, it isn&#8217;t One-Time-Olivers.</li>
</ul>
<p>&nbsp;</p>
<p><strong> Vanity Vanessa</strong></p>
<p>Vanity accounts frequently enter Reddit, they flicker, and they go out. They get discouraged. They never really commit to the bit. That&#8217;s what happens to them. The histogram below takes on that familiar long-tail curve.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/vanity-vanessa-vote.png"><img class="aligncenter size-full wp-image-844" title="vanity-vanessa-vote" src="http://christopherberry.ca/wp-content/uploads/2012/02/vanity-vanessa-vote.png" alt="" width="571" height="483" /></a><strong>Takeaways:</strong></p>
<ul>
<li>There are lot of Vanity-Vanessa&#8217;s, some 7,527 of them.</li>
</ul>
<ul>
<li>Most of them posted only 2, 3, or 4 times.</li>
</ul>
<p>So, how did they vote?</p>
<p>The histogram below summarizes the story:</p>
<p>&nbsp;</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/vanity-vanessa-averagevote.png"><img class="aligncenter  wp-image-843" title="vanity-vanessa-averagevote" src="http://christopherberry.ca/wp-content/uploads/2012/02/vanity-vanessa-averagevote.png" alt="" width="565" height="491" /></a><strong>Takeaways:</strong></p>
<ul>
<li>Vanity-Vanessa&#8217;s upvoted nearly everything they saw, with very few exceptions.</li>
</ul>
<ul>
<li>Very few persistently downvoted everything they saw.</li>
</ul>
<ul>
<li>They&#8217;re not the ones downvoting you on Reddit.</li>
</ul>
<p>&nbsp;</p>
<p>Average-Andy&#8217;s</p>
<p>Recall that the average username votes 326 times, and yet, I still labeled Average-Andy, ranging between 9 and 48 votes, as average andy. That&#8217;s because the mean number of votes that Average-Andy&#8217;s cast is 22.25 &#8211; which is close to the median of 20 for the entire set.</p>
<p>This mixing and abstraction of median, mean, and segmentation isn&#8217;t something that I expect most people to consider or think about, but I can foresee some getting hung up on it. When you think about an equal segmentation though, it makes sense that the mean of your middle category should be close to the median of the entire set.</p>
<p>For everybody else &#8211; just know that you&#8217;re you&#8217;re looking at the &#8220;average joe redditor&#8221; here.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/average-andy-votes.png"><img class="aligncenter size-full wp-image-840" title="average-andy-votes" src="http://christopherberry.ca/wp-content/uploads/2012/02/average-andy-votes.png" alt="" width="614" height="492" /></a><strong>Takeaways:</strong></p>
<ul>
<li>Average number of votes is 22.25, close to the median of 20 for the whole set.</li>
<li>Familiar long tail.</li>
</ul>
<p>How do they vote?</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/average-andy-averagevote.png"><img class="aligncenter size-full wp-image-845" title="average-andy-averagevote" src="http://christopherberry.ca/wp-content/uploads/2012/02/average-andy-averagevote.png" alt="" width="560" height="485" /></a><strong>Takeaways:</strong></p>
<ul>
<li>A majority of Average Andy&#8217;s liked everything they saw &#8211; they upovoted everything.</li>
</ul>
<ul>
<li>They downvote more often than Vanity-Vanessa&#8217;s or One-Time-Oliver&#8217;s, but not massively.</li>
</ul>
<ul>
<li>They aren&#8217;t downvoting in such a huge way to say that these are the ones downvoting you on reddit.</li>
</ul>
<p>&nbsp;</p>
<p><strong>Frequent Fred</strong></p>
<p>By now you&#8217;re pretty much a pro at reading these histograms. Frequent Fred&#8217;s vote frequently. Look at the histogram below.</p>
<p>&nbsp;</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/frequent-fred-votes.png"><img class="aligncenter size-full wp-image-847" title="frequent-fred-votes" src="http://christopherberry.ca/wp-content/uploads/2012/02/frequent-fred-votes.png" alt="" width="571" height="493" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>Classic long-tail continues.</li>
</ul>
<ul>
<li>Averaging 139.3 votes.</li>
</ul>
<ul>
<li>The unusual bump at the beginning of the series is just magnified by the scale from the previous vote frequency histogram. (It&#8217;s fine).</li>
</ul>
<p>How do they vote?</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/frequent-fred-averagevote.png"><img class="aligncenter size-full wp-image-846" title="frequent-fred-averagevote" src="http://christopherberry.ca/wp-content/uploads/2012/02/frequent-fred-averagevote.png" alt="" width="564" height="480" /></a><strong>Takeaways:</strong></p>
<ul>
<li>Far fewer of them are likely to upvote absolutely everything they see.</li>
</ul>
<ul>
<li>There&#8217;s significant flattening of the long tail &#8211; the average is .74.</li>
</ul>
<ul>
<li>More of them, on average, are disposed to downvoting.</li>
</ul>
<p><strong>Power Paulines</strong></p>
<p>Power Paulines are the most difficult group to analyze, but the easiest to summarize and understand. Take a look at the histogram below.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/power-pauline-votes.png"><img class="aligncenter size-full wp-image-849" title="power-pauline-votes" src="http://christopherberry.ca/wp-content/uploads/2012/02/power-pauline-votes.png" alt="" width="585" height="489" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>The long tail is holding &#8211; there&#8217;s significant clustering at 1000 and 2000.</li>
</ul>
<ul>
<li>The cause is related to rate limiting within the Reddit API.</li>
</ul>
<ul>
<li>The longest part of the long tail &#8211; those power users with thousands and thousands of votes, are all bundled and clustered together at 2000.</li>
</ul>
<ul>
<li>There are around 500 of such power users, representing some 1.5% of the total usernames.</li>
</ul>
<p>So how do they vote?</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/power-pauline-voteaverage.png"><img class="aligncenter size-full wp-image-848" title="power-pauline-voteaverage" src="http://christopherberry.ca/wp-content/uploads/2012/02/power-pauline-voteaverage.png" alt="" width="566" height="488" /></a></p>
<p><strong> Takeaways:</strong></p>
<ul>
<li>The bump at 0 is caused by 1000 upvotes getting averaged out by 1000 upvotes.</li>
</ul>
<ul>
<li>0&#8242;s aside, which are tugging on the mean, Pauline&#8217;s are on average more prone to downvoting.</li>
</ul>
<ul>
<li>Power Paulines are downvoting you on Reddit.</li>
</ul>
<p>&nbsp;</p>
<p><strong>Putting a bow on it</strong></p>
<p>The chart below summarizes the relationship between segment and their average vote. You can see a clear negative direction. The more one uses Reddit, the more one downvotes &#8211; even if the mean is exaggerated in the Power Pauline segment.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/reddit-summary.png"><img class="aligncenter size-full wp-image-838" title="reddit-summary" src="http://christopherberry.ca/wp-content/uploads/2012/02/reddit-summary.png" alt="" width="544" height="355" /></a></p>
<p>To really hammer the point home about the origin of downovotes, take a look a the table below. It&#8217;s broken out by the segments you understand. It also contains two new variables &#8211; upvotes and downvotes. That is the total count of the number of upvotes and downvotes made by each segment.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/TotalSUM.png"><img class="aligncenter size-full wp-image-854" title="TotalSUM" src="http://christopherberry.ca/wp-content/uploads/2012/02/TotalSUM.png" alt="" width="656" height="875" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>One-Time Olivers as a group were responsible for 175 of all the downvotes cast.</li>
</ul>
<ul>
<li>Vanity-Vanessa&#8217;s as a group were responsible for 1781 of all the downvotes cast.</li>
</ul>
<ul>
<li>Average-Andy&#8217;s as a group were responsible for 13,258 of all the downvotes cast.</li>
</ul>
<ul>
<li>Frequent-Fred as a group were responsible for 120,758 of all the downvotes cast.</li>
</ul>
<ul>
<li>Power-Paulines as a group were responsible for 1,672,368 of all the OBSERVED downvotes cast &#8211; but are probably responsible for a lot more in aggregate across all of Reddit. (This sample contains a bias, but bias doesn&#8217;t mean I can&#8217;t say anything at all about anything.)</li>
</ul>
<p>Note the differences in order of magnitude between each group. 1781 is roughly 10 times greater than 175. And so, a bit imperfectly on the way up to Frequent-Fred&#8217;s. There&#8217;s an order of magnitude difference here in terms of the amount of weight each group casts.</p>
<p><strong>The greatest power users users of Reddit are the ones who are downvoting you &#8211; and it&#8217;s an exponential power.</strong></p>
<p>&nbsp;</p>
<p><strong>But wait, there&#8217;s more.</strong></p>
<p>Recall, however, that there over 7 million votes cast. 1.8 million were downvotes, and 5.5 million were upvotes. Read the statistics table below to verify that.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/redditvotes.png"><img class="aligncenter size-full wp-image-837" title="redditvotes" src="http://christopherberry.ca/wp-content/uploads/2012/02/redditvotes.png" alt="" width="455" height="383" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>Upvotes outnumber downvotes.</li>
</ul>
<ul>
<li>The interface of Reddit itself causes upvotes to accumulate.</li>
</ul>
<ul>
<li>Reddit itself is a cause of a bias &#8211; probably by design.</li>
</ul>
<p>The histogram below is by links &#8211; the content getting upvoted or downvoted. There were just over 2 million links submitted. On average, each link received 3.62 upvotes. Given everything you know about long tails, think about just how deceptive that 3.62 mean figure is. Note how you can&#8217;t even see the bumps in the tail. And be in awe of the efficiency of the collective Reddit behavior that causes popular content to disproportionately promoted while even &#8216;good&#8217; or &#8216;average&#8217; content gets relentlessly shifted to the left &#8211; all by a very small group of people.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2012/02/votes-per-link.png"><img class="aligncenter size-full wp-image-839" title="votes-per-link" src="http://christopherberry.ca/wp-content/uploads/2012/02/votes-per-link.png" alt="" width="598" height="495" /></a></p>
<p><strong>Takeaways:</strong></p>
<ul>
<li>The long tail is long and powerful.</li>
</ul>
<ul>
<li>This small group Power-Paulines are far more likely to downvote because of a much higher frequency of use.</li>
</ul>
<p>I&#8217;m thanking Reddit for making so many API&#8217;s publicly exposed and enabling this sort of analysis and exploration. Thank you.</p>
<p>&nbsp;</p>
<p>Portions of this post appeared on <a href="http://christopher-berry.blogspot.com/" target="_blank">Eyes On Analytics</a> the week of February 5, 2012.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2012/02/whos-downvoting-you-on-reddit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Commentary on the proposed telescreens</title>
		<link>http://christopherberry.ca/2012/01/commentary-on-the-proposed-telescreens/</link>
		<comments>http://christopherberry.ca/2012/01/commentary-on-the-proposed-telescreens/#comments</comments>
		<pubDate>Sun, 15 Jan 2012 20:37:28 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Strategic Analytics]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=806</guid>
		<description><![CDATA[You may have read something about the Samsung 7500 and 8000 series televisions, the ones with a camera installed in them, over the past few days. The tl;dr summary: &#8220;For Samsung&#8217;s 7500 and 8000 series TVs, all you have to do is say &#8220;Hi, TV,&#8221; when you walk into a room for the TV to [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://adage.com/article/special-report-ces/tv-watch/232094/" target="_blank">You may have read something</a> about the Samsung 7500 and 8000 series televisions, the ones with a camera installed in them, over the past few days.</p>
<p><strong>The tl;dr summary:</strong></p>
<p><em>&#8220;For Samsung&#8217;s 7500 and 8000 series TVs, all you have to do is say &#8220;Hi, TV,&#8221; when you walk into a room for the TV to turn on and know who&#8217;s there.&#8221;</em></p>
<p><em>&#8220;Think of it: The tech means an advertiser or TV programmer could, for the first time, know which members of a Nielsen household are watching a show or an ad. Cisco has even developed a system meant to read facial expressions and determine whether you&#8217;re entertained or bored.&#8221;</em></p>
<p><em>&#8220;Many people in the living room are multitasking with other devices. &#8220;We&#8217;re paying for that,&#8221; said Rex Harris, innovations supervisor at SMGX, a unit of ad agency holding company Publicis Groupe. &#8220;If you&#8217;re looking at other screens, then you&#8217;re not paying attention. We would like to know if we&#8217;re getting accurate impressions.&#8221;"</em></p>
<p><strong>Commentary:</strong></p>
<p>Alright &#8211; so &#8211; a simple innovation, the webcam, is jumping from the PC/DVR into a TV, and we get a few folks who come out and speculate what it could mean. It all ends up sounding like a 1984 telescreen idea, which, I&#8217;m 99% certain, is not what Samsung has/had in mind.</p>
<p><strong>Broadcast isn&#8217;t digital.</strong></p>
<p>Repeat: broadcast. isn&#8217;t. digital.</p>
<p><strong>This has implications:</strong></p>
<ul>
<li>There is enough inventory for targeted ads and offers in digital because the technology enables the creation of multiple ad treatments at scale. No such technology exists in the broadcast industry.</li>
</ul>
<ul>
<li>People already effectively segment themselves by TV show preference.</li>
</ul>
<ul>
<li>On Demand technologies like Netflix, and time shifting technologies like streaming and DVR&#8217;s, are already eroding the concentration of key market segments.</li>
</ul>
<ul>
<li>Plot the S-curve adoption rate of the technologies driving market fragmentation against the adoption of new, Big-Brother enabled telescreens, and see which wins. (Hint: it&#8217;s time shifting and on-demand).</li>
</ul>
<ul>
<li>You&#8217;re paying for junk impressions because we&#8217;re developing ad blindness, just like we&#8217;ve developed banner blindness.</li>
</ul>
<p>No amount of surveillance is going to change that fact.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2012/01/commentary-on-the-proposed-telescreens/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Find Hidden Patterns in Big Data &#8211; A Commentary on MINE, Reshef et al (2011)</title>
		<link>http://christopherberry.ca/2011/12/find-hidden-patterns-in-big-data-a-commentary-on-mine-reshef-et-al-2011/</link>
		<comments>http://christopherberry.ca/2011/12/find-hidden-patterns-in-big-data-a-commentary-on-mine-reshef-et-al-2011/#comments</comments>
		<pubDate>Sun, 18 Dec 2011 22:10:11 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Complexity Analytics]]></category>
		<category><![CDATA[Data Science]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=803</guid>
		<description><![CDATA[You may have read something about &#8216;Detecting Novel Associations in Large Data Sets&#8217;, a paper appearing in Science, 334, 1518 (2011) by David N. Reshef et al.. You can check out the software here. This is an initial commentary and an explanation about what it&#8217;s all about. The Longer You Look, The More Likely Error [...]]]></description>
			<content:encoded><![CDATA[<p>You may have read something about &#8216;Detecting Novel Associations in Large Data Sets&#8217;, a paper appearing in Science, 334, 1518 (2011) by David N. Reshef et al.. You can check out the software <a href="http://www.exploredata.net/">here</a>.</p>
<p>This is an initial commentary and an explanation about what it&#8217;s all about.</p>
<p><strong>The Longer You Look, The More Likely Error will Find You</strong></p>
<p>Take a very large dataset, say, all the customers of AT&amp;T and their calling records 2001-2011, and divide it into to two random but equal sets. Say you didn&#8217;t have any hypothesis at all. You just wanted to see what was related to each other in that set. Say, each customer record has 5000 features, including gender, date of birth, credit score, average call durations, most frequently dialed number, and so on. (Note to statisticians: Assume a Pearson R correlation matrix, skip next paragraph).</p>
<p>Assume, further, that you&#8217;re going to compare each feature against one another. So, you compared all the ages against all the date of births. And then all the ages against credit scores, and so on. And, the strength of the relationship between those two features was expressed by a single number. The higher that number is, the stronger the relationship between the two. For instance, we might find that credit score and age are tightly correlated &#8211; the older one is, the more likely their credit score is to be positive.</p>
<p>You&#8217;re likely to find clearly incorrect relationships in such a large table, just by accident. You might find that in Dataset A, for instance, that&#8217;s there&#8217;s a statistically significant relationship between being a Virgo and having a negative credit score. There might be a relationship between average call duration and being a Capricorn. You know that such a result doesn&#8217;t make sense. Why would zodiac sign (derived from date of birth) affect those things? The way that chance works in such large tables is that the longer you look for significant features, the more likely it is that you&#8217;ll find a relationship that doesn&#8217;t in fact hold in the real world.</p>
<p>In fact, most of those relationships would disappear in Dataset B. However, new, clearly untrue relationships would appear in Dataset B that don&#8217;t exist in Dataset A. When you&#8217;re dealing with thousands of features, the likelyhood of such phenomenon increases. And that&#8217;s even holding everything we know about probability to be true.</p>
<p>In sum, a big reason why you go into a dataset with a hypothesis is to reduce the risk of coming up with something that is wrong, and very unlikely to be repeatable in other datasets.</p>
<p><strong>Linear, Cubic, Exponential, Parabolic, Elipse</strong></p>
<p>Not all relationships are straight lines. Indeed, especially in certain types of logistic regression, we can get very amazing, very beautiful and complex shapes separating one case from another. Diaper usage plotted against age is a parabolic relationship. Think about it. You use a lot of them when you&#8217;re young, you go through a lot of them when you&#8217;re very old. You don&#8217;t need too many of them in early to late age. Linear regression wouldn&#8217;t perform very well in detecting that pattern.</p>
<p><strong>Enter Reshef et al and MIC</strong></p>
<p>MIC stands for Maximal Information Coefficient. Reshef et al invented a neat way of looking at relationships between variables that doesn&#8217;t rely solely on a key statistical test (Pearson R) to indicate that it&#8217;s there. The authors demonstrated how MIC manages to detect correlations between all these complex relationship types &#8211; Cubic, Exponential, Sinusoidal &#8211; and does it really well. The went further. The created a program that can mine very large datasets and suggest relationships to examine.</p>
<p><strong>What&#8217;s the Problem?</strong></p>
<p>Remember that the longer you look, the more likely you&#8217;ll find something false, idea? The entire idea of hypothesis testing as the basis of quantitative analysis is an entrenched one. It&#8217;s an idea that causes resistance to advanced machine learning algorithms and pattern discovery. Reshef really did a great job in explaining the purpose of MIC. Reshef has merely stated that this is a hypothesis informing machine. You can use the program and MIC to discover relationships that were once really quite hidden. Or very, very difficult to discover without insanely expensive software. I think this is great.</p>
<p><strong>The Opportunity</strong></p>
<p>We&#8217;re generating huge amounts of data. The big feature big data problem is increasingly common. This is a great tool to rapidly inform hypotheses &#8211; to become smarter before getting smarter. It&#8217;s a welcome advancement, and worthy of attention.</p>
<p>If you hear of MIC, just know that a MIC of 0.00 means that there is no correlation between two variables, and that a MIC of 1.00 indicates a perfect correlation between two variables. Be aware that MIC does not imply linearity between the variables, but may be of a much higher order function. The second question you should ask upon hearing a MIC score is &#8216;at what confidence interval is it significant?&#8217;, and, &#8216;what kind of relationship is it?&#8217;. Then deep dive.</p>
<p><strong>I&#8217;m excited. </strong></p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2011/12/find-hidden-patterns-in-big-data-a-commentary-on-mine-reshef-et-al-2011/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to predict how many visits a website will receive on a given day</title>
		<link>http://christopherberry.ca/2011/11/how-to-predict-how-many-visits-a-website-will-receive-on-a-given-day/</link>
		<comments>http://christopherberry.ca/2011/11/how-to-predict-how-many-visits-a-website-will-receive-on-a-given-day/#comments</comments>
		<pubDate>Wed, 16 Nov 2011 16:24:57 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Predictive Analytics]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=770</guid>
		<description><![CDATA[Predictive analytics is somewhat mysterious. So, let&#8217;s shed some light on it. (Note that I&#8217;m simplifying this quite a bit to be accessible.) The first step in predictive analytics is to understand what you&#8217;re predicting. We&#8217;ll call this the Y variable. In this instance, &#8216;how many visits from Boston can I expect on a given [...]]]></description>
			<content:encoded><![CDATA[<p>Predictive analytics is somewhat mysterious. So, let&#8217;s shed some light on it.</p>
<p>(Note that I&#8217;m simplifying this quite a bit to be accessible.)</p>
<p><strong>The first step</strong> in predictive analytics is to understand what you&#8217;re predicting. We&#8217;ll call this the Y variable.</p>
<p>In this instance, &#8216;how many visits from Boston can I expect on a given day&#8217;. My Y will be &#8216;Visits&#8217;.</p>
<p>I&#8217;m curious about it.</p>
<p>Have some discipline. I see way too many analysts change the Y variable before their investigation is through.</p>
<p><strong>The second step</strong> is to identify all the variables that might be associated with a variation in Y. These might include factors like paid media, search, new visits, returning visits &#8211; and date. Then there are paid campaigns, posting new content, social campaigns, traditional media spend, promotions, and so on. Day of the week is another key variable, along with statutory holidays, and extending out to other factors like weather and creativity.</p>
<p><strong>The third step</strong> is to extract, transform, and load the data you CAN actually access. You can spend months fighting to build an absolute complete model, or, you CAN start putting together a story with the facts that are available. I chose action over inertia. You should too.</p>
<p>That date field is usually pretty bad to extract, transform, and load. There are functions both in excel and SPSS that handle dates with some difficulty. Devils abound in the details around &#8216;the date where in the world&#8217;. If your installation is set to Eastern Time, and most of your traffic comes from Australia, you&#8217;ll be one day lagged. You ought to adjust the figures using the appropriate offset.</p>
<p>The figure below is what I could extract from Google Analytics in about an hour. (Collinearity abounds!)</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2011/11/predictive-model1.png"><img class="aligncenter size-medium wp-image-771" title="predictive-model1" src="http://christopherberry.ca/wp-content/uploads/2011/11/predictive-model1-300x37.png" alt="" width="423" height="52" /></a><strong></strong></p>
<p><strong>The fourth step is to run the math</strong> against your model.</p>
<p>I use SPSS to run a regression. If you don&#8217;t have SPSS, you can try using open source programs like Octave or R. The reason for using software is because it&#8217;s annoying to do by hand. I didn&#8217;t enjoy a copy of SPSS at my first research position, so I had to code out linear regression in Excel. I learned a lot, but it is not expedient!</p>
<p>The figure below is the output from the software.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2011/11/simple-model1.png"><img class="aligncenter size-medium wp-image-773" title="simple-model" src="http://christopherberry.ca/wp-content/uploads/2011/11/simple-model1-300x54.png" alt="" width="300" height="54" /></a></p>
<p>The way to read the table is Y = Constant + B1(X1) + B2(X2).</p>
<p>So, Visits = 4.888 &#8211; 1.872 (istheweekend).</p>
<p>If it&#8217;s the weekend, I can predict Visits = 4.888 &#8211; 1.872 (1). Which equals 3 visits.</p>
<p>If it&#8217;s not the weekend, I can predict Visits = 4.888 &#8211; 1.872(0). Which equals 4.888.</p>
<p>Not bad for Boston traffic! And I understand the impact of a single variable on visits.</p>
<p>My dataset is incredibly spikey. So, what&#8217;s causing some of that spikyness? I went through all the dates that I posted new content &#8211; reran the math, and got the table below.</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2011/11/newpost.png"><img class="aligncenter size-medium wp-image-775" title="newpost" src="http://christopherberry.ca/wp-content/uploads/2011/11/newpost-300x59.png" alt="" width="300" height="59" /></a></p>
<p>The model above is the best. It explains 12.7% of the variance in the set.</p>
<p>The equation is: Visits = 4.496 -1.76(istheweekend) + 2.482(newpost).</p>
<p>I can tell &#8211; according to this version of reality &#8211; that if I want the maximum bump from Boston, posting during the weekday is best. And I can tell the proportional impact of each variable.</p>
<p>Sometimes this answer is good enough. There are more advanced methods &#8211; like curvilinear regression, machine learning, and neural networks. There are ways to introduce more variables into the equation. But typically &#8211; this method is sufficient to get a first idea about the relationships among variables and their relative importance, rooted in fact, as opposed to gut bias.</p>
<p><strong>The fifth step</strong> is to make decisions based on scenarios.</p>
<p>If you take this equation and plot it out, you can engage in a few what-if&#8217;s. Would writing more weekend friendly material result in a lower Beta? Would increasing the frequency of new posts drastically improve the performance of the website? If so, by how much? The size of the newpost beta, as compared to the total number of Boston visits per day hints at that relative strength.</p>
<p>That&#8217;s the power of predictive analytics.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2011/11/how-to-predict-how-many-visits-a-website-will-receive-on-a-given-day/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Siri and Search</title>
		<link>http://christopherberry.ca/2011/11/siri-and-search/</link>
		<comments>http://christopherberry.ca/2011/11/siri-and-search/#comments</comments>
		<pubDate>Thu, 10 Nov 2011 04:27:30 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Design Thinking]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=764</guid>
		<description><![CDATA[Gary Morgenthaler had a few interesting statements to make: &#8220;Therefore, when Siri was an independent company, its plan was to map these domains deeply and seamlessly to automate transactions for its users within them. For example, “Buy that Steve Jobs biography book and send it to my dad”; “Send a dozen yellow roses to my [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Gary Morgenthaler had a few interesting statements to make:</strong></p>
<blockquote><p>&#8220;Therefore, when Siri was an independent company, its plan was to map these domains deeply and seamlessly to automate transactions for its users within them. For example, “Buy that Steve Jobs biography book and send it to my dad”; “Send a dozen yellow roses to my wife”; “Book me the usual table for 2 tonight at 8 p.m. at Giovanni’s”; and “Get me 2 box seats for the Giants game on Saturday.”</p>
<p>Then comes the question of what solves our biggest problems. Ultimately, Siri’s value is that of automation and removing “friction” on the Internet. Siri achieves this by: (1) understanding speech input in natural language form, (2) mapping user requests against its knowledge base (i.e., ontological domains) and (3) activating software “agents” to interact with Internet service providers to fulfill user requests.&#8221;</p></blockquote>
<p><strong>Source:</strong> <a href="http://techcrunch.com/2011/11/09/gary-morgenthaler-siri-will-eat-google/" target="_blank">TechCrunch</a></p>
<p>Let&#8217;s just forget Google for a minutes and focus in on this combination of technologies.</p>
<ul>
<li>Understand.</li>
</ul>
<ul>
<li>Map.</li>
</ul>
<ul>
<li>Act.</li>
</ul>
<p>That&#8217;s the general design pattern for a whole range of applications.</p>
<p>Certainly nothing new here.</p>
<p>They&#8217;ve solved a good problem. There are certain use cases for which Siri is a great solution.</p>
<p>He ignores the rest of the problem space. And that&#8217;s just fine. I don&#8217;t expect him to point out the subset of infinite use cases that Siri is woefully inadequate for.</p>
<p>Barriers, like a small keyboard, are soon to be resolved by virtual keypads and a range of next generation hand gestures that are sensed, not tactically received. I don&#8217;t see them as insurmountable.</p>
<p>Even Star Trek TNG made use of both voice and physical commands.</p>
<p>Siri is not a Google-search killer.</p>
<p>It is a nice complement.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2011/11/siri-and-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Data Science</title>
		<link>http://christopherberry.ca/2011/10/data-science/</link>
		<comments>http://christopherberry.ca/2011/10/data-science/#comments</comments>
		<pubDate>Fri, 28 Oct 2011 14:17:25 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Data Science]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=725</guid>
		<description><![CDATA[Data Science is the mix of computer science, user experience, and statistics. The aim of data science should be: to make things better by influencing people and things to make better decisions, by making people and things more aware of better alternatives, based on better algorithms and more relevant data. Language kept intentionally vague to [...]]]></description>
			<content:encoded><![CDATA[<p>Data Science is the mix of computer science, user experience, and statistics.</p>
<p><strong>The aim of data science should be:</strong></p>
<ul>
<li>to make things better</li>
</ul>
<ul>
<li>by influencing people and things to make better decisions,</li>
</ul>
<ul>
<li>by making people and things more aware of better alternatives,</li>
</ul>
<ul>
<li>based on better algorithms and more relevant data.</li>
</ul>
<p><em>Language kept intentionally vague to set up the &#8216;well that could be anything&#8217; argument when it suits me later.</em></p>
<p>If you do it right, nobody is really aware of the complexity of what just happened to them. The point is not to experience data. The point is to experience&#8230;an experience. And be better off for it!</p>
<p>And, the most interesting part is that it&#8217;s not really driven by humans with hidden agendas. Though, that could play a part. It&#8217;s driven by machines which generate rules that most designers don&#8217;t understand fully.</p>
<p>Haven&#8217;t heard of Data Science? You&#8217;re not alone. It&#8217;s only just become a &#8216;thing&#8217; lately.</p>
<p>The usual fight for the soul of Data Science (the language, identity, ego) has begun in earnest. <a title="Fight for Data Science Sould" href="http://christopher-berry.blogspot.com/2011/10/fight-for-data-science-soul-begins.html">You can read the editorial summary here</a>. This will go on for the better part of a decade, and frankly, nobody outside of the emerging data science community is really going to care. But it&#8217;ll be important to a few. And they&#8217;ll make it a big deal, solely because language contains bias about beliefs, and don&#8217;t question my damned beliefs, dammit.</p>
<p>I don&#8217;t have much of a dog in that fight. I&#8217;d much rather get to the good stuff.</p>
<p><strong>Why am I excited and optimistic about the prospects for Data Science?</strong></p>
<p>Never before has so much data about so much meaning so little to so many. The world is filled with waste and genuinely bad things. What if you could make sense of more of it? What would you do then? How much better off would we be?</p>
<p>This is a bit beyond the novelty of<a href="http://www.theonion.com/articles/freakonomist-keeps-close-eye-on-ge-stock-versus-he,17202/"> Freakanomics</a>.</p>
<p>You may recall a line of reasoning that James Burke once put forward in his series Connections. He argued that we tend to believe that technological advancement causes the world to become better, when, in all reality, every technological advancement has made the environment worse off while making people relatively better off. There&#8217;s been a tradeoff. It seems that technological advancement is at odds with sustainability.</p>
<p>But does it have to be?</p>
<p>By becoming more aware of cause and effect as individuals, groups, communities, companies, organizations and societies &#8211; can we become better?</p>
<p>It is, after all, not just about tracking the world. It&#8217;s about making sense of all that data too. Thinkers like <a href="http://jeffjonas.typepad.com/">Jeff Jonas</a> have been putting forward ideas about sensemaking for some time, and I take no credit for it. It&#8217;s not so much that the data excites me. It&#8217;s the opportunity that that data opens up.</p>
<p>I think there&#8217;s good reason to believe things can be better.</p>
<p>Picture related. Without meaning, how can you make sense of anything?</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2010/04/mattinglyshavethosesideburns.jpg"><img class="aligncenter size-full wp-image-157" title="mattinglyshavethosesideburns" src="http://christopherberry.ca/wp-content/uploads/2010/04/mattinglyshavethosesideburns.jpg" alt="" width="200" height="150" /></a></p>
<p>***</p>
<p>I&#8217;m Christopher Berry.</p>
<p>I bridge the gap between <a href="http://www.webanalyticsassociation.org/members/blog_view.asp?id=538344" target="_blank">marketing science and data science</a>.</p>
<p>I welcome connections on <a href="http://twitter.com/cjpberry" target="_blank">Twitter</a> and <a href="http://www.linkedin.com/profile/view?id=26002267" target="_blank">LinkedIn</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2011/10/data-science/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How consumers use mobile for shopping</title>
		<link>http://christopherberry.ca/2011/10/how-we-use-mobile-for-shopping/</link>
		<comments>http://christopherberry.ca/2011/10/how-we-use-mobile-for-shopping/#comments</comments>
		<pubDate>Thu, 27 Oct 2011 14:28:27 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Mobile Analytics]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=747</guid>
		<description><![CDATA[How consumers are using mobile to shop IRL (In Real Life) is of paramount interest now that mobile has finally arrived. A few figures to run through. The first, below, describes what consumers report they want from mobile phone applications, for the holidays, in August 2011. A common behavior, well known to clicks-and-bricks retailers, is [...]]]></description>
			<content:encoded><![CDATA[<p>How consumers are using mobile to shop IRL (In Real Life) is of paramount interest now that mobile has finally arrived. A few figures to run through. The first, below, describes what consumers report they want from mobile phone applications, for the holidays, in August 2011.</p>
<p>A common behavior, well known to clicks-and-bricks retailers, is that consumers will research products before coming in store to buy them. This is especially true of electronics goods, but I suppose it&#8217;s conceivable they do it for home appliances, automotive purchases, and anything else that is generally of high consideration. Mobile offers the capability of researching while you&#8217;re physically in the store. And, since most stores are now ghost towns, it enables the consumer to help themselves.</p>
<p>Expect more of that in December 2011.</p>
<p>&nbsp;</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2011/10/mobileinformation.gif"><img class="aligncenter size-full wp-image-748" title="mobileinformation" src="http://christopherberry.ca/wp-content/uploads/2011/10/mobileinformation.gif" alt="" width="324" height="207" /></a></p>
<p>Note the desire for coupons and sale information. People want deals, dammit. It&#8217;s not exactly something I&#8217;d be pushing if I were a mobile marketer. Why cannibalize my in-store sales? Well &#8211; I might think of a way to drive urgency using the device. But I wouldn&#8217;t want to throw a &#8220;20% off&#8221; display ad just because I want proof linking the mobile channel to in-store sales. Certainly, there could be a mechanism. A reward of some type, perhaps.</p>
<p>Finally, there&#8217;s that 32% figure that sticks out. &#8216;Buying products&#8217;. It&#8217;s 2003 all over again and smartphones are to mobile commerce as broadband was to ecommerce.</p>
<p>The second set of statistics follows below. They used a control group and they&#8217;re reporting the differences. It&#8217;s suggesting that mobile is more effective at driving a number of brand metrics (not direct attribution metrics like a web analyst might assume). Their reporting on the relative impact of the channel on self-reported attitudinal changes, post exposure.</p>
<p>&nbsp;</p>
<p><a href="http://christopherberry.ca/wp-content/uploads/2011/10/mobiledisplayversusonline.gif"><img class="aligncenter size-full wp-image-749" title="mobiledisplayversusonline" src="http://christopherberry.ca/wp-content/uploads/2011/10/mobiledisplayversusonline.gif" alt="" width="324" height="376" /></a></p>
<p>A summary states:</p>
<p>&#8220;According to Dynamic Logic, there are three important factors that drive a successful mobile campaign. They are the location of a brand name or logo within a mobile ad matters: left-side brand placement is generally most effective and has a strong impact on advertising recall; clear and persistent branding is important for brand awareness and a strong call-to-action encourages interactivity and engagement and helps drive purchase intent.&#8221; <a href="http://email-marketing-companies.tmcnet.com/topics/email-marketing-companies/articles/228314-mobile-advertising-more-effective-than-online-advertising.htm">Source</a>.</p>
<p>The take away is not &#8220;use mobile to drive awareness&#8221;. That is not a good takeaway. Mobile is not a mass awareness channel, no more than paid search is. It&#8217;s not the way the channel works and it&#8217;s certainly not the way consumers want the channel to be used with them. Do you really want to hit people with a SMS coupon every time they visit Deborah in accounting at the north side of the building? (It&#8217;s just within the 200m radius of a Starbucks). That&#8217;s the wrong takeaway, even if it is highly likely that awareness is higher. (It better be, there&#8217;s less on the screen to look at.)</p>
<p>Mobile, good mobile, forces much more discipline. It demands subtraction. It demands that choices be made. This isn&#8217;t a corporate webpage where everything can be added.</p>
<p>There&#8217;s more constraint because there&#8217;s more constraint.</p>
<p>Finally, there&#8217;s Korea. It&#8217;s the last piece of evidence I&#8217;ll put forward.</p>
<p>The video below explains how Korean marketers are assisting people rescue otherwise wasted time. In this instance, it&#8217;s shopping from the subway, using smartphones and codes.</p>
<p><iframe src="http://www.youtube.com/embed/nJVoYsBym88" frameborder="0" width="420" height="315"></iframe></p>
<p>This represents a fairly impressive increase in productivity. Mobile enables consumers to be more productive in their lives by converting what was previously wasted opportunity into rescued time. You&#8217;re also resurrecting outdoor display advertising and commanding direct consumer attention AND action. It&#8217;s awesome and goes well beyond &#8216;click this QR code to see our awesome marketing microsite&#8217;.</p>
<p>Recall the product adoption lifecycle. Innovators will try things simply because it&#8217;s novel. There&#8217;s a long chasm. Is that chasm ever brutal. At the other side of it there are early-adopters. Early-adopters will try things because it&#8217;s obvious that it will be useful. What we&#8217;re seeing here is some evidence that we&#8217;re through the chasm, at least when it comes to porting very common digital activities that used to happen on a laptop, over to a mobile device. The grayer area is the role of portable devices (tablets) and that role in driving changes in consumer behavior at mass.</p>
<p><strong>How would you use mobile, not so much to increase awareness (it&#8217;s not a mass channel) but to complete the action-purchase portion of the conversion cycle?</strong></p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2011/10/how-we-use-mobile-for-shopping/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Web Analytics Wednesday &#8211; October 26 &#8211; Wellington</title>
		<link>http://christopherberry.ca/2011/10/web-analytics-wednesday-october-26-wellington/</link>
		<comments>http://christopherberry.ca/2011/10/web-analytics-wednesday-october-26-wellington/#comments</comments>
		<pubDate>Wed, 26 Oct 2011 14:30:29 +0000</pubDate>
		<dc:creator>Christopher Berry</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Analytics Strategy]]></category>
		<category><![CDATA[Complexity Analytics]]></category>
		<category><![CDATA[Complexity Economics]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Mobile Analytics]]></category>
		<category><![CDATA[Social Analytics]]></category>
		<category><![CDATA[Social Media Analytics]]></category>
		<category><![CDATA[Social Media Measurement]]></category>

		<guid isPermaLink="false">http://christopherberry.ca/?p=743</guid>
		<description><![CDATA[Web Analytics Wednesday is tonight at The Wellington, in downtown Toronto&#8217;s analytics alley. It&#8217;s generously supported by AT Internet. There are some 40 people &#8211; representing among the best of the best, who will be in attendance. It&#8217;s a great opportunity for web analysts, social analysts, marketing scientists, data scientists, hackers, developers, and usability professionals [...]]]></description>
			<content:encoded><![CDATA[<p>Web Analytics Wednesday is tonight at <a href="http://www.barwellington.ca/">The Wellington</a>, in downtown Toronto&#8217;s analytics alley. It&#8217;s generously supported by <a href="http://en.atinternet.com/">AT Internet</a>. There are some 40 people &#8211; representing among the best of the best, who will be in attendance. It&#8217;s a great opportunity for web analysts, social analysts, marketing scientists, data scientists, hackers, developers, and usability professionals to come out and talk about the great ideas and opportunities we have going on in Toronto.</p>
<p>It&#8217;s also the first get together after eMetrics New York, which was a major, and had big time Canadian attendance. These tend to be among the more interesting evenings. It has also been some three months since the last WAWTO event, so there should be quite a few fresh stories.</p>
]]></content:encoded>
			<wfw:commentRss>http://christopherberry.ca/2011/10/web-analytics-wednesday-october-26-wellington/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

