You may notice periodic sessions in the web analytics data, people visiting the website, looking at a few pages, and leaving, never to return. Or returning frequently, always checking the same pages. These are usually discount seekers, and there’s a data science niche building around them. Price interception is one of the biggest trends in big data science that we’re not telling you about. What’s driving it? Discount seekers exist in any market, but this segment grows when consumer sentiment is low. And there a whole bunch of new technologies designed to cause people feel good about bargain hunting, contributing to the growth, and likely establishment, of the sector. Many are building up an information advantage against firms, and using[…]

My first experience with product management was a course called ‘software engineering’. Of the fifteen teams of students who were using the full throttle software engineering method, complete with UML and a short burst of requirements gathering in the up-front, eight teams failed to deliver a product. Of the three teams I knew that succeeded, they each had one person that did all the software development, while the rest of the team was responsible for the documentation and working the waterfall. Certainly, this was quantitative evidence that software engineering, as it was conceived of back then, was really ineffective. How could mechanical engineers hang a VW Bug from a sculpture, reliably and safely, using their engineering principles, while, a group of software[…]

You may have read something about ‘Detecting Novel Associations in Large Data Sets’, a paper appearing in Science, 334, 1518 (2011) by David N. Reshef et al.. You can check out the software here. This is an initial commentary and an explanation about what it’s all about. The Longer You Look, The More Likely Error will Find You Take a very large dataset, say, all the customers of AT&T and their calling records 2001-2011, and divide it into to two random but equal sets. Say you didn’t have any hypothesis at all. You just wanted to see what was related to each other in that set. Say, each customer record has 5000 features, including gender, date of birth, credit score,[…]

Gary Morgenthaler had a few interesting statements to make: “Therefore, when Siri was an independent company, its plan was to map these domains deeply and seamlessly to automate transactions for its users within them. For example, “Buy that Steve Jobs biography book and send it to my dad”; “Send a dozen yellow roses to my wife”; “Book me the usual table for 2 tonight at 8 p.m. at Giovanni’s”; and “Get me 2 box seats for the Giants game on Saturday.” Then comes the question of what solves our biggest problems. Ultimately, Siri’s value is that of automation and removing “friction” on the Internet. Siri achieves this by: (1) understanding speech input in natural language form, (2) mapping user requests[…]

Data Science is the mix of computer science, user experience, and statistics. The aim of data science should be: to make things better by influencing people and things to make better decisions, by making people and things more aware of better alternatives, based on better algorithms and more relevant data. Language kept intentionally vague to set up the ‘well that could be anything’ argument when it suits me later. If you do it right, nobody is really aware of the complexity of what just happened to them. The point is not to experience data. The point is to experience…an experience. And be better off for it! And, the most interesting part is that it’s not really driven by humans with[…]

Web Analytics Wednesday is tonight at The Wellington, in downtown Toronto’s analytics alley. It’s generously supported by AT Internet. There are some 40 people – representing among the best of the best, who will be in attendance. It’s a great opportunity for web analysts, social analysts, marketing scientists, data scientists, hackers, developers, and usability professionals to come out and talk about the great ideas and opportunities we have going on in Toronto. It’s also the first get together after eMetrics New York, which was a major, and had big time Canadian attendance. These tend to be among the more interesting evenings. It has also been some three months since the last WAWTO event, so there should be quite a few[…]

Not all data is usable on its own. The vast majority of it isn’t in its raw form. Its coal. It has potential. But on its own, it has limited uses. Algorithms are the modern day equivalent to machinery. Fire (combustion) is really just statistical analysis – a violent process that generates waste in the form of heat and soot. Our modern day Watt Pump is Google. Their coal is HTML. The best coal used to be the HREF link. The algorithm that drives Google’s primary product is PageRank. It runs on a massive amount of coal. Most people aren’t aware of the complexity that goes on – and why should they. All the mine owners really cared about was[…]