It’s very easy to be cynical about new ideas, especially when they’ve been previously hyped and previously failed. Ideas fail. Statistically, failure is the norm. I’ve been asking myself the question: “What’s different today that might make yesterday’s fad become sustainable?” There are three broad analytical areas that are prime for re-discovery and a fresh round of hype: Splimes. Augmented Reality – GIS. Website Morphing. Reasons for skepticism: I don’t want my refrigerator to tweet when it’s empty. I don’t want to give brands yet another channel to spam me with coupons. I find the Internet hard enough to use, I don’t need my favourite sites changes all the time. What’s different now: I want to make things that are[…]
Author: Christopher Berry
Business Intelligence is not Data Science. There’s a lot of ‘yeah but’ statements eminating from some in the BI community. TL;DR summary: Yeah but, it’s all about driving business insights from the data! Yeah but, Data Science still uses all the same BI tools we use! Yeah but, Data Science is really just what BI was years ago! A perspective: No. BI is about using asymmetrical information advantage to extract surplus from customers. Data Science is discovering pareto optima between the customer and the business. No. Data Science is not religious about toolsets. No. Data Scientists have seen what went gone wrong with BI. Achieving the same fate would be a failure. What I stand for as a Data Scientist:[…]
Have you ever heard anybody use the sentence: “The problem with that model is that it over fits the data.” Ever wonder that that means? The purpose of science is to use knowledge to make good predictions about the future. To do so, you use theories which inform models. Models are deliberate simplifications of the world which make explicit statements about the direction of the arrow of causality, and are judged to be useful only if the assumptions are actually good. A good model makes accurate predictions about the future. That supposes that the assumptions which underpin the model are actual best-proxies for how nature actually works. [Data scientists: If you have a problem with what I wrote here, leave[…]
Yesterday I concluded that “Existing theoretical frameworks assume too much, and demand too much cognition by the end user.” The opposite of asking you to think about linear regression or support vector machines is Netflix. Netflix uses a machine algorithm to suggest movies that you might like. They do this using a few sources: When you first sign up, Netflix asks a few questions about you. They have a prior viewing history of all their subscribers before you, who also answered a few questions about themselves. Y You tell them what you like by watching various movies and shows. You tell them more by rating them on a five star rating system. By comparing your tastes to other people like[…]
James March explains that making a decision involves understanding alternatives, forming expectations about what’s likely to happen, thinking about your preferences in terms of your wants, fears, hopes, dreams in relation to those expectations, and then making a choice. That explanation really resonates. So we’re going to use it here. There’s an assumption that choice amongst alternatives is cut and dry. It isn’t. Choice is a form of knowledge – specifically: There are choices that you know you know. There are choices that you know you don’t know about. And there are choices you know you don’t know. Choices themselves aren’t even really binary. There’s significant ambiguity as to what a choice really means. How many times have you heard[…]
Ask most analysts and they’ll have a very straightforward theory about how decisions are made. They did up numbers. They put them into context. Decision makers make a decision based off those numbers and context. Only that they don’t. Enter March and Olsen. In 1972 they programmed a simulation in Fortran. It’s called a Garbage Can Model. Their idea was a solid 40 years ahead of its time. To summarize the Garbage Can Model: Institutions are organized anarchies. Problems, solutions, participants and energy go into a Garbage Can and shaken all around. Solutions really search for problems. When you mix it with Arrow’s Impossibility Theorem, you get a much more complete picture of why groups of people don’t behave the[…]
You may hear marketing scientists or data scientists talk about simulation. Simulation, at its core, means the imitation of something real. The purpose of a simulation is to understand a system or a model. A simulation enables the analyst to take something very complex, program it, and run it again and again hundreds, thousands, or even tens of billions of times. A good simulation takes in many independent variables, and produces a single number that is meaningful to a human. (Strong recommendation to web analysts: resist the urge to produce multiple dependent variables.) A simulation: Can be fed figures that are observed in nature. Can imitate a system. Can produce figures that can be compared to figures that are observed[…]
Last week, Statistics Canada (StatsCan) gave us the initial population counts, broken down by area, from the 2011 census. It was a great day for analysts, and a congratulate them delivering. A huge shoutout to The Canadian Press for a pretty sweet interactive map of the results. It’s a great mirror. It’s a great reflection of ourselves. Where people live is a pretty big indicator (not explicitly a predictor) of many other things. Like people clump alike. There are areas associated with low income. There are areas associated with rapidly rising income. There are areas associated with established wealth. As a result, education, income, and wealth are associated with where you live. If I have your name and postal code,[…]
Consider: Only 97% of analysts use Excel at some point in their careers. Only 3% of web analysts have yet to use R. A whopping 0.3% of web analysts have downloaded PANDAS since Monday. Now consider: A whopping 97% of analysts use Excel at some point in their careers. A whopping 3% have used R. Only 0.3% of web analysts have downloaded PANDAS since Monday. Leading words shape perception. Perception shapes both what is asked and biases within what is asked next. For Instance: Who the hell are the 3% who haven’t used Excel? Why is Excel such a dominant tool at 97%? And Next: Why aren’t way more web analysts using R? Wow, what do those 3% of web[…]
Scott Hanselman wrote an excellent piece on App geo-location data. If there’s a nobel prize for writing blog titles, he would win it. The piece is entitled: It’s 2012 and your kids have an iPhone – Do you know where they are? I do. Admiration aside, yes, you’re living through one of the greatest rises of applied Geographic Information Systems (GIS), ever. It’s bigger than the launching of the first weather satellite. Or LandSat. This time, it’s millions of people equipped with sensors. And they’re doing the sensing. Many apps use geo-location data as a function of what they do, of varying utility, for the user: There are traffic congestion apps that rely on applied GIS – to crowdsource intelligence[…]