How to predict how many visits a website will receive on a given day
Predictive analytics is somewhat mysterious. So, let’s shed some light on it.
(Note that I’m simplifying this quite a bit to be accessible.)
The first step in predictive analytics is to understand what you’re predicting. We’ll call this the Y variable.
In this instance, ‘how many visits from Boston can I expect on a given day’. My Y will be ‘Visits’.
I’m curious about it.
Have some discipline. I see way too many analysts change the Y variable before their investigation is through.
The second step is to identify all the variables that might be associated with a variation in Y. These might include factors like paid media, search, new visits, returning visits – and date. Then there are paid campaigns, posting new content, social campaigns, traditional media spend, promotions, and so on. Day of the week is another key variable, along with statutory holidays, and extending out to other factors like weather and creativity.
The third step is to extract, transform, and load the data you CAN actually access. You can spend months fighting to build an absolute complete model, or, you CAN start putting together a story with the facts that are available. I chose action over inertia. You should too.
That date field is usually pretty bad to extract, transform, and load. There are functions both in excel and SPSS that handle dates with some difficulty. Devils abound in the details around ‘the date where in the world’. If your installation is set to Eastern Time, and most of your traffic comes from Australia, you’ll be one day lagged. You ought to adjust the figures using the appropriate offset.
The figure below is what I could extract from Google Analytics in about an hour. (Collinearity abounds!)
The fourth step is to run the math against your model.
I use SPSS to run a regression. If you don’t have SPSS, you can try using open source programs like Octave or R. The reason for using software is because it’s annoying to do by hand. I didn’t enjoy a copy of SPSS at my first research position, so I had to code out linear regression in Excel. I learned a lot, but it is not expedient!
The figure below is the output from the software.
The way to read the table is Y = Constant + B1(X1) + B2(X2).
So, Visits = 4.888 – 1.872 (istheweekend).
If it’s the weekend, I can predict Visits = 4.888 – 1.872 (1). Which equals 3 visits.
If it’s not the weekend, I can predict Visits = 4.888 – 1.872(0). Which equals 4.888.
Not bad for Boston traffic! And I understand the impact of a single variable on visits.
My dataset is incredibly spikey. So, what’s causing some of that spikyness? I went through all the dates that I posted new content – reran the math, and got the table below.
The model above is the best. It explains 12.7% of the variance in the set.
The equation is: Visits = 4.496 -1.76(istheweekend) + 2.482(newpost).
I can tell – according to this version of reality – that if I want the maximum bump from Boston, posting during the weekday is best. And I can tell the proportional impact of each variable.
Sometimes this answer is good enough. There are more advanced methods – like curvilinear regression, machine learning, and neural networks. There are ways to introduce more variables into the equation. But typically – this method is sufficient to get a first idea about the relationships among variables and their relative importance, rooted in fact, as opposed to gut bias.
The fifth step is to make decisions based on scenarios.
If you take this equation and plot it out, you can engage in a few what-if’s. Would writing more weekend friendly material result in a lower Beta? Would increasing the frequency of new posts drastically improve the performance of the website? If so, by how much? The size of the newpost beta, as compared to the total number of Boston visits per day hints at that relative strength.
That’s the power of predictive analytics.
Siri and Search
Gary Morgenthaler had a few interesting statements to make:
“Therefore, when Siri was an independent company, its plan was to map these domains deeply and seamlessly to automate transactions for its users within them. For example, “Buy that Steve Jobs biography book and send it to my dad”; “Send a dozen yellow roses to my wife”; “Book me the usual table for 2 tonight at 8 p.m. at Giovanni’s”; and “Get me 2 box seats for the Giants game on Saturday.”
Then comes the question of what solves our biggest problems. Ultimately, Siri’s value is that of automation and removing “friction” on the Internet. Siri achieves this by: (1) understanding speech input in natural language form, (2) mapping user requests against its knowledge base (i.e., ontological domains) and (3) activating software “agents” to interact with Internet service providers to fulfill user requests.”
Source: TechCrunch
Let’s just forget Google for a minutes and focus in on this combination of technologies.
- Understand.
- Map.
- Act.
That’s the general design pattern for a whole range of applications.
Certainly nothing new here.
They’ve solved a good problem. There are certain use cases for which Siri is a great solution.
He ignores the rest of the problem space. And that’s just fine. I don’t expect him to point out the subset of infinite use cases that Siri is woefully inadequate for.
Barriers, like a small keyboard, are soon to be resolved by virtual keypads and a range of next generation hand gestures that are sensed, not tactically received. I don’t see them as insurmountable.
Even Star Trek TNG made use of both voice and physical commands.
Siri is not a Google-search killer.
It is a nice complement.
Nov
09
- Continue Reading →
- Christopher Berry
- No Comments
- Data Science, Design Thinking
Data Science
Data Science is the mix of computer science, user experience, and statistics.
The aim of data science should be:
- to make things better
- by influencing people and things to make better decisions,
- by making people and things more aware of better alternatives,
- based on better algorithms and more relevant data.
Language kept intentionally vague to set up the ‘well that could be anything’ argument when it suits me later.
If you do it right, nobody is really aware of the complexity of what just happened to them. The point is not to experience data. The point is to experience…an experience. And be better off for it!
And, the most interesting part is that it’s not really driven by humans with hidden agendas. Though, that could play a part. It’s driven by machines which generate rules that most designers don’t understand fully.
Haven’t heard of Data Science? You’re not alone. It’s only just become a ‘thing’ lately.
The usual fight for the soul of Data Science (the language, identity, ego) has begun in earnest. You can read the editorial summary here. This will go on for the better part of a decade, and frankly, nobody outside of the emerging data science community is really going to care. But it’ll be important to a few. And they’ll make it a big deal, solely because language contains bias about beliefs, and don’t question my damned beliefs, dammit.
I don’t have much of a dog in that fight. I’d much rather get to the good stuff.
Why am I excited and optimistic about the prospects for Data Science?
Never before has so much data about so much meaning so little to so many. The world is filled with waste and genuinely bad things. What if you could make sense of more of it? What would you do then? How much better off would we be?
This is a bit beyond the novelty of Freakanomics.
You may recall a line of reasoning that James Burke once put forward in his series Connections. He argued that we tend to believe that technological advancement causes the world to become better, when, in all reality, every technological advancement has made the environment worse off while making people relatively better off. There’s been a tradeoff. It seems that technological advancement is at odds with sustainability.
But does it have to be?
By becoming more aware of cause and effect as individuals, groups, communities, companies, organizations and societies – can we become better?
It is, after all, not just about tracking the world. It’s about making sense of all that data too. Thinkers like Jeff Jonas have been putting forward ideas about sensemaking for some time, and I take no credit for it. It’s not so much that the data excites me. It’s the opportunity that that data opens up.
I think there’s good reason to believe things can be better.
Picture related. Without meaning, how can you make sense of anything?
***
I’m Christopher Berry.
I bridge the gap between marketing science and data science.
Oct
28
- Continue Reading →
- Christopher Berry
- No Comments
- Data Science
How consumers use mobile for shopping
How consumers are using mobile to shop IRL (In Real Life) is of paramount interest now that mobile has finally arrived. A few figures to run through. The first, below, describes what consumers report they want from mobile phone applications, for the holidays, in August 2011.
A common behavior, well known to clicks-and-bricks retailers, is that consumers will research products before coming in store to buy them. This is especially true of electronics goods, but I suppose it’s conceivable they do it for home appliances, automotive purchases, and anything else that is generally of high consideration. Mobile offers the capability of researching while you’re physically in the store. And, since most stores are now ghost towns, it enables the consumer to help themselves.
Expect more of that in December 2011.
Note the desire for coupons and sale information. People want deals, dammit. It’s not exactly something I’d be pushing if I were a mobile marketer. Why cannibalize my in-store sales? Well – I might think of a way to drive urgency using the device. But I wouldn’t want to throw a “20% off” display ad just because I want proof linking the mobile channel to in-store sales. Certainly, there could be a mechanism. A reward of some type, perhaps.
Finally, there’s that 32% figure that sticks out. ‘Buying products’. It’s 2003 all over again and smartphones are to mobile commerce as broadband was to ecommerce.
The second set of statistics follows below. They used a control group and they’re reporting the differences. It’s suggesting that mobile is more effective at driving a number of brand metrics (not direct attribution metrics like a web analyst might assume). Their reporting on the relative impact of the channel on self-reported attitudinal changes, post exposure.
A summary states:
“According to Dynamic Logic, there are three important factors that drive a successful mobile campaign. They are the location of a brand name or logo within a mobile ad matters: left-side brand placement is generally most effective and has a strong impact on advertising recall; clear and persistent branding is important for brand awareness and a strong call-to-action encourages interactivity and engagement and helps drive purchase intent.” Source.
The take away is not “use mobile to drive awareness”. That is not a good takeaway. Mobile is not a mass awareness channel, no more than paid search is. It’s not the way the channel works and it’s certainly not the way consumers want the channel to be used with them. Do you really want to hit people with a SMS coupon every time they visit Deborah in accounting at the north side of the building? (It’s just within the 200m radius of a Starbucks). That’s the wrong takeaway, even if it is highly likely that awareness is higher. (It better be, there’s less on the screen to look at.)
Mobile, good mobile, forces much more discipline. It demands subtraction. It demands that choices be made. This isn’t a corporate webpage where everything can be added.
There’s more constraint because there’s more constraint.
Finally, there’s Korea. It’s the last piece of evidence I’ll put forward.
The video below explains how Korean marketers are assisting people rescue otherwise wasted time. In this instance, it’s shopping from the subway, using smartphones and codes.
This represents a fairly impressive increase in productivity. Mobile enables consumers to be more productive in their lives by converting what was previously wasted opportunity into rescued time. You’re also resurrecting outdoor display advertising and commanding direct consumer attention AND action. It’s awesome and goes well beyond ‘click this QR code to see our awesome marketing microsite’.
Recall the product adoption lifecycle. Innovators will try things simply because it’s novel. There’s a long chasm. Is that chasm ever brutal. At the other side of it there are early-adopters. Early-adopters will try things because it’s obvious that it will be useful. What we’re seeing here is some evidence that we’re through the chasm, at least when it comes to porting very common digital activities that used to happen on a laptop, over to a mobile device. The grayer area is the role of portable devices (tablets) and that role in driving changes in consumer behavior at mass.
How would you use mobile, not so much to increase awareness (it’s not a mass channel) but to complete the action-purchase portion of the conversion cycle?
Oct
27
- Continue Reading →
- Christopher Berry
- No Comments
- Mobile Analytics







Nov
16