Ask most analysts and they’ll have a very straightforward theory about how decisions are made. They did up numbers. They put them into context. Decision makers make a decision based off those numbers and context. Only that they don’t. Enter March and Olsen. In 1972 they programmed a simulation in Fortran. It’s called a Garbage Can Model. Their idea was a solid 40 years ahead of its time. To summarize the Garbage Can Model: Institutions are organized anarchies. Problems, solutions, participants and energy go into a Garbage Can and shaken all around. Solutions really search for problems. When you mix it with Arrow’s Impossibility Theorem, you get a much more complete picture of why groups of people don’t behave the[…]
Month: February 2012
You may hear marketing scientists or data scientists talk about simulation. Simulation, at its core, means the imitation of something real. The purpose of a simulation is to understand a system or a model. A simulation enables the analyst to take something very complex, program it, and run it again and again hundreds, thousands, or even tens of billions of times. A good simulation takes in many independent variables, and produces a single number that is meaningful to a human. (Strong recommendation to web analysts: resist the urge to produce multiple dependent variables.) A simulation: Can be fed figures that are observed in nature. Can imitate a system. Can produce figures that can be compared to figures that are observed[…]
Last week, Statistics Canada (StatsCan) gave us the initial population counts, broken down by area, from the 2011 census. It was a great day for analysts, and a congratulate them delivering. A huge shoutout to The Canadian Press for a pretty sweet interactive map of the results. It’s a great mirror. It’s a great reflection of ourselves. Where people live is a pretty big indicator (not explicitly a predictor) of many other things. Like people clump alike. There are areas associated with low income. There are areas associated with rapidly rising income. There are areas associated with established wealth. As a result, education, income, and wealth are associated with where you live. If I have your name and postal code,[…]
Consider: Only 97% of analysts use Excel at some point in their careers. Only 3% of web analysts have yet to use R. A whopping 0.3% of web analysts have downloaded PANDAS since Monday. Now consider: A whopping 97% of analysts use Excel at some point in their careers. A whopping 3% have used R. Only 0.3% of web analysts have downloaded PANDAS since Monday. Leading words shape perception. Perception shapes both what is asked and biases within what is asked next. For Instance: Who the hell are the 3% who haven’t used Excel? Why is Excel such a dominant tool at 97%? And Next: Why aren’t way more web analysts using R? Wow, what do those 3% of web[…]
Scott Hanselman wrote an excellent piece on App geo-location data. If there’s a nobel prize for writing blog titles, he would win it. The piece is entitled: It’s 2012 and your kids have an iPhone – Do you know where they are? I do. Admiration aside, yes, you’re living through one of the greatest rises of applied Geographic Information Systems (GIS), ever. It’s bigger than the launching of the first weather satellite. Or LandSat. This time, it’s millions of people equipped with sensors. And they’re doing the sensing. Many apps use geo-location data as a function of what they do, of varying utility, for the user: There are traffic congestion apps that rely on applied GIS – to crowdsource intelligence[…]
So who keeps on downvoting you on Reddit? We’ll find out. But first – three notes: You may be familiar with Reddit. If you’re not – you can read this explanation about what Reddit is. To answer that question, I downloaded a dataset that was built in early 2011 or very late 2010. The dataset is a 29MB gzip compressed and contains 7,405,561 votes from 31,927 users over 2,046,401 links. You can read about the methodology here. The file contains three columns – a vote, a userid, and a link. Only people who had their privacy settings set to open had that data read by an API. There is no meta-data about who these people are in real life (IRL)[…]
This is the fifth in a series of five posts about Reddit and Analytics. Previously – we covered the nature of the dataset, read histograms, generated segments, and understood that the most frequent users of Reddit are the ones who are doing the most downvoting by an astounding margin. But wait, there’s more. Recall, however, that there over 7 million votes cast. 1.8 million were downvotes, and 5.5 million were upvotes. Read the statistics table below to verify that. Takeaways: Upvotes outnumber downvotes. The interface of Reddit itself causes upvotes to accumulate. Reddit itself is a cause of a bias – probably by design. The histogram below is by links – the content getting upvoted or downvoted. There were just[…]
This is the fourth in a series of five posts about Reddit and Analytics. The complete thread will be posted at the end of the week. Previously – we covered the nature of the dataset, read histograms, generated segments, and examined them. Putting a bow on it The chart below summarizes the relationship between segment and their average vote. You can see a clear negative direction. The more one uses Reddit, the more one downvotes – even if the mean is exaggerated in the Power Pauline segment. To really hammer the point home about the origin of downovotes, take a look a the table below. It’s broken out by the segments you understand. It also contains two new variables –[…]
This is the third in a series of five posts about Reddit and Analytics. The complete thread will be posted at the end of the week. Previously – we covered the nature of the dataset, read histograms, and generated our segments. Now we’re going to examine each segment individually. One-Time-Olivers There are very efficient ways that statisticians quickly summarize and understand the relationship among variables. The aim here isn’t to be efficient – but to be clear. In that spirit, I give you the histogram below.Takeaways: All 4877 One-Time-Olivers voted exactly one time. You should lol. It makes sense though, right? And, the segment name should make a lot more sense.The histogram below summarizes how, on average, One-Time-Olivers voted –[…]
This is the second in a series of five posts about Reddit and Analytics. The complete thread will be posted at the end of the week. Recall that the average of all the votes a username made is called ‘averagevote’. If somebody was persistently downvoting links, they’d have a negative number. If they upvoted everything they saw, they’d have an averagevote of +1. Read the histogram below. The three takeaways are: Negativity follows a reverse long tail. (It really happens – see how the figures fall away to left) On average, usernames upvoted what they saw (average 0.79). There are bumps at 0 (related to a methodological note) and at -1. By now, two of my good friends in London[…]