This is Big Data Week in Toronto. I’ll be delivering a case study on the business value of that data, but on a rather small, but beautifully complex, dataset on Monday. Big Data has now just become a marketing term. Those who have put in the effort, and read the three or four HBR articles on the subject, know more than 80% of the population. If you’ve read up on some of the applications involved, you’re ahead of 95%. If you read this, you’re ahead of 99.99% of the population. So, there’s an incentive to read on. What is Big Data? A good definition of Big Data is anything that is generally too big to fit in the memory of[…]

You may notice periodic sessions in the web analytics data, people visiting the website, looking at a few pages, and leaving, never to return. Or returning frequently, always checking the same pages. These are usually discount seekers, and there’s a data science niche building around them. Price interception is one of the biggest trends in big data science that we’re not telling you about. What’s driving it? Discount seekers exist in any market, but this segment grows when consumer sentiment is low. And there a whole bunch of new technologies designed to cause people feel good about bargain hunting, contributing to the growth, and likely establishment, of the sector. Many are building up an information advantage against firms, and using[…]

We get these wonderful clues about people all the time. It’s easy to lose sight of the macroeconomic situation when you’re focused on the trees, or at least so many logs at the mill. Let’s take a look. You’re looking at consumer sentiment. It’s an index, where 1996 is set at 100. The gray bars are periods when the economy was in a technical recession. When this number is high, more consumers feel good about spending money. They feel like it’s a good time to buy a major household item. The think they’re more better off now than they were last year. They think the prospects for the next six months look good. They think things are looking up for[…]

There are (at least) four reasons why analysts are picking python. For a decade now, I’ve relied on Python for simulation and data ETL, and I’ve depended on SPSS or R for data analysis. The reason for the two-step (and sometimes three if we include excel) is that there were no good libraries that could really replace SPSS or R completely. Scipy and numpy are excellent for operating on well formed arrays of data, but are decisively less efficient, from a user perspective, at handling data. Data frames, popularized by R, are finally available through Python through a package called PANDAS. And it’s a nice library. Scipy and numpy, two very popular libraries, are still out there in use too.[…]

The original intent of the D-LID project, the Design Lab for Interpreted Data, was to generate facts about the way different people interpreted digital analytics data. It was to be a website with a few treatments of the same dataset. Participants would be watched to see which treatments they found useful. With some help from Bayes, we’d put some hard core facts on the table about data design in the context of different audiences. We’d make the data available to members of the DAA for a year, and then open it up to the public thereafter. After it was all scoped out, the median estimated price tag was too much. In talking to partners, we got that figure down to[…]

The European Special Interest Group (SIGEU) of the Digital Analytics Association put out a whitepaper on privacy compliance in December. You can get a copy of it here. It’s an excellent paper. It not only summarizes cookie laws in the EU, but also contains evidence of tracking collapse and the consequences of the interruption caused by the opt-in provision. This is particularly important. The HTTP Cookie was invented in 1994. Its original purpose was the measure the proportion of browsers that were first time visitors to a site. It spawned thousands of new inventions. Because that’s what technologists do. That’s what humans do. We use tools and invent new uses for them. So it went with the cookie. Look at[…]

Kleinbl00 wrote an excellent synthesis of the phenomenon gripping Reddit right now. (Explanation of what Reddit here.) Here’s the link. Here’s the quote for posterity: “It isn’t a brain drain, it’s climate change. Early Reddit was an environment friendly towards tech geeks who wanted something more indepth than slashdot or HN. As such, it attracted erudite geeks. Middle Reddit was an environment friendly towards thinkers and seekers who were looking for discussion beyond what was available on the archetypal PHPBBs, news outlet comment sections and, notably, Digg. As such, it attracted thinkers and seekers. Late Reddit is an environment friendly towards image macros and memes. As such, it attracts ineloquent teenagers. Something Reddit did early on, under Alexis and Steve,[…]

Well, that’s one way to validate your heuristics. http://5000best.com Note the use of an aggregate average rating as the first column. That’s likely designed to have an effect on your perception. (Machines can adjust your opinion!). Check it out.

Why is SoLoMo the best trend? It’s the newest! (#YOLO) Remember that meeting in 2003, and then again in 2004, and 2005, and 2007, and again in 2009, when somebody would come into the room and pitch: “Imagine this, you’re walking down the street, and you get an SMS for a free Starbucks coffee because you’re 40 feet away from one! Wouldn’t that be AWESOME???” Picture related. (No, that would not be awesome. I’d get a coupon offer every 50 feet walking through downtown.) Well, they’re back, baby! And, with new jargon to boot. SoLoMo! SoLoMo had a competing concept, called LoSoMo, during the early part of 2012. I’m not quite sure if the repositioning of the Lo to the[…]

Consumer centric analytics. A lot of money is about to be spent convincing you that a 360 degree leveraged of the consumer can be constructed using scrapped data sources. The clickstream paradigm isn’t consumer centric analytics. I’ve said it before. In this post, we’ll look at a problem-solution-opportunity set. The Problem Set There’s a lot of counting going on. The counting of views through the iPad versus views through Facebook versus views through the work computer versus views through the home laptop. There are pockets of some pretty good usability analysis, some very good optimization, and, we’re finally getting some real statistical rigor into digital analytics in a few places. It’s great to see. Better information ought to be causing better[…]