Audrey Watters wrote a good article about Data Science in 2011 for O’Reilly Radar. Audrey cites three big events / trends: Hadoop, an open source distributed computing framework, became ubiquitous. More Data, More Privacy Problems – citing the Apple scandal as an example. Open Data at an Inflection Point. (With a great shoutout to my friends at BuzzData!) Editorial: Even Hadoop has warts (gasp!?!111shiftoneone), but so far there are good anecdotes about it working out well for many companies, and, you can expect a few horror stories in 2012. The devices we’re carrying and our own behaviors are generating more data than ever – and people – that means you – need to be aware of when they’re consenting to[…]

Steve Miller authored a very good article about Data Science Skepticism over at Information Management. I’ve previously written about Data Science and shared an excellent video about what makes a great data scientist. Both posts are expanded primers on the emerging field. The TL;DR version is: A Data Scientist (DS) sits at the intersection of computer science, statistical methods, and business. I won’t define what Business Intelligence (BI) is. There’s an EMC study making the rounds. Steve Miller takes exception to some portions of that study. To summarize Steve Miller: Findings from the EMC survey made certain statements about BI’s that are unnecessarily polarizing, and should be viewed with suspicion by data scientists (which should be their natural inclination anyway).[…]

Tyler Nichols writes: “I am done with the freemium model“. Tyler divided all the users of his service into two groups: free and paid. He measured the behaviors of each group. He found that the free group was detrimental to his business because: They emailed more questions on average than paid people. They hit the spam button when he emailed them with a follow-up, paid people didn’t. Free customers were not worth the maintenance costs they caused.  Hacker News and other communities replied (paraphrased): Free people were not as engaged, and therefore more wreckless. It was a santa letter generator, which has low repeat value after the season. The plural of anecdote isn’t evidence, you’ve added little value to freemium[…]

I used this blog to talk to very specific groups. Sometimes it’s marketers. Sometimes web analysts. Sometimes it was candidates applying for a position. Sometimes it’s data scientists, brandsters, and social analysts. Sometimes this worked. Sometimes I confused the hell out of different audiences at different times. I’ll continue to speak to web analysts through the research committee of the Web Analytics Association, in particular, through a new experiment we’re launching and ongoing Peer Review Journals. I’ll continue to speak and collaborate with ultra niche communities – data scientists, marketing scientists, and open data professionals through christopherberry.ca. Eyes on Analytics is shifting. I’ll be curating content from not just from marketing analytics, but also from further afield. My goal is[…]

It’s worth explaining The Gartner Hype Cycle. It’s topical for 2012. It works as follows: Usually many people invent a technology during the same envelope of time. Somebody really gets hooked on the idea. That somebody executes the technology sufficiently well that it produces a technological trigger. And that gets the ball rolling. Awareness spreads through a single market, and then transmits into adjacent markets. Excitement spreads like fire. People are quick to see potential. Enthusiasm is contagious, and opposing views are downvoted into gray obscurity. Innovators are visionaries. After all, I’m winking, pointing a finger at you, and making a ‘click click’ sound my voice. ‘Hay, click click’. This is an impolite way of saying that ‘ignorance increases’. Hype[…]

You may have read something about ‘Detecting Novel Associations in Large Data Sets’, a paper appearing in Science, 334, 1518 (2011) by David N. Reshef et al.. You can check out the software here. This is an initial commentary and an explanation about what it’s all about. The Longer You Look, The More Likely Error will Find You Take a very large dataset, say, all the customers of AT&T and their calling records 2001-2011, and divide it into to two random but equal sets. Say you didn’t have any hypothesis at all. You just wanted to see what was related to each other in that set. Say, each customer record has 5000 features, including gender, date of birth, credit score,[…]

I live and work at one of the most amazing intersections. It’s also the cause of why things don’t mean what people assume that they mean. There are technologists – developers and computer scientists – who grapple with the limitations imposed by API’s and big data. There are marketing scientists – analysts and statisticians – who grapple with the limitations imposed by computability and understandability. There are marketers – brand and channel – who grapple with the limitations imposed by budget and cognitive surplus. It’s pretty amazing how a technologist, a marketing scientist, and a marketer can all be right within their own silo, their own way of thinking, but collectively be misunderstood and wrong. The confluence of all three[…]

It’s an annual tradition. We navel gaze. Every December. It’s as predictable as the tides. So, let’s talk 2012. A Forrester author, Joe Stanhope, asked us what we wanted to be when we grew up. I replied, ‘doing something meaningful’, or something to that effect. I meant it. Let’s consider something really meaningful that’s happening right now. Joe painted a picture of accelerating medium fragmentation and bloatware trying to keep up. Indeed, I’m encountering more analysts gathering the pitchforks against the new-new media. After all, if we can’t even do x right, what business do they have to even attempt y? Because it’s there. It’s never been a better time to be a marketing scientist or an analyst. It’s never[…]

Danny Sullivan wrote a pretty good blog post about an article getting deleted. You can read it here. I’m not so interested or outraged about it. This spawned a Hacker News thread. You can read the whole thing here. The comment I want to draw attention to comes to us from Phil Welch. It’s so good that I’m quoting it below. “Turns out if you throw together a few thousand neckbeards and convince them to play status games around building an encyclopedia, you get an encyclopedia. You also get a whole lot of stupid politics, wasted energy, process wanking, flamewars, and acronym-laden cryptic discourses where words like “arbitration” have strange, Orwellian connotations. (“Arbitration” is Wikipedia’s name for the process governing,[…]

Predictive analytics is somewhat mysterious. So, let’s shed some light on it. (Note that I’m simplifying this quite a bit to be accessible.) The first step in predictive analytics is to understand what you’re predicting. We’ll call this the Y variable. In this instance, ‘how many visits from Boston can I expect on a given day’. My Y will be ‘Visits’. I’m curious about it. Have some discipline. I see way too many analysts change the Y variable before their investigation is through. The second step is to identify all the variables that might be associated with a variation in Y. These might include factors like paid media, search, new visits, returning visits – and date. Then there are paid[…]