Web analytics uses clickstream data.

It’s data that is:

  • Generally Anonymous
  • Generally Aggregated
  • Heavily Abstracted

Most commercial web analytics software abstracts away the raw data with fairly usable interfaces. You’ll be hard pressed to find many people these days who know how to work with server log data. Yet, it’s still possible to segment a population of browsers based on the characteristics of the browser, computer, and reverse geographic lookup.

That is to say, I can query, through the software, the differences between IE browsers in Toronto originating from Reddit from, say, Chrome browsers in New York originating from search. And then I can compare the differences between them. If it’s an eCommerce site, I may even be able to compare the differences in purchasing value.

Most websites are not eCommerce enabled.

Most database analysts deal with customer transaction data.

It’s data that is:

  • Personally Identifiable
  • Generally unaggregated
  • Transactional

These tend to be very large databases with very poor interfaces. And, generally speaking, people are segmented based on what they purchase and the marketing treatments they have received.

The three key letters to understand here are RFM:

  • Recency
  • Frequency
  • Monetary

That is to say, I can query, through specialized software, the differences between customers in Toronto originating from direct mail, from, say, customers who have recently changed their address to New York originating from a campaign in 2002. And, applying some more analytical magic, and this part is key – when nobody is interfering, I can work out lifetime value (LTV).

Web traffic contains a massive amount of data that is not from customers. You have prospects. You have competitors. You have job applicants. If it’s not set up correctly, it contains traffic from your own company. And from your agencies. It contains people who are researching a product, and, who will never return to your site ever again. It may even contain robots, and not contain browsers that don’t execute Javascript.

You’re examining a superset of traffic, and, you’re examining browsers. You’re not really looking at people across multiple sessions, either. At least, not in the sense that others have been led to believe.

Rudimentary RFM analysis has been attempted at the Clickstream level, with mixed results. It’s possible to do. But I’d wager it’s actively practiced in fewer than 5% of all web analytics practices out there.

There is a third category of data that has since been lumped in – called ‘VOC’, or ‘Voice of Customer’ data. It’s Market Research surveys on the website. This data, again, is not linked in any way to identifiable customers. And, it also contains all those other populations found on the website.

Traditional customer analytics / database mining on customer data is extremely narrow, and that web analytics is extremely broad.

I know it’s popular to say that clickstream data is customer intelligence, but knowing what we know about the data itself, that it contains more than just customers in the aggregate, do we really believe that?


I’m Christopher Berry.
I tweet about analytics @cjpberry
I write at christopherberry.ca

5 thoughts on “Do we really believe that clickstream data is customer intelligence?

  1. I never believed it. “Visitors” are not “customers” or even “prospects”, unless you incentive the anonymous visitors to identify themselves as prospects or customers, then you’re talking.

    Unfortunately, most web analysts are not able to handle that yet, so chances are rather high that the real customer analysts will come to us and say “Move over!”.

  2. Thank you for the comment Jacques,

    Is there hope that web analysts may actually develop more complex causal models in an effort to untangle the click stream from the individual transaction record?

  3. Totally agree with Jacques & you – unless you can tie clickstream to specific customers, you obviously can’t call it “customer intelligence”… Not sure you can even call it “intelligence”!

  4. Pouls29 says:

    I liken it to fishing. The fishing line in the water is like the anonymous data. It does nothing, useless. But then when there’s a fish on the end (you identify the user), then you’ve got some real action and can use that fishing line (click-stream data)!

  5. @Stephane that was much more succinct than the entire blog post. Thank you.

    @Pouls what can we do to bait the hook? Huuuuh? Huuuuh? 🙂

Comments are closed.