Information Retrieval from Analytics Systems

Posted onOctober 11, 2016 Edit onAugust 7, 2021 by Christopher Berry

In general, information retrieval from analytics systems becomes harder with the degree of customization (It gets harder to find things over time). That customization is frequently an expression of the values of a culture over time. The inertia of the technical debt caused by early customization is greater than the inertia of a data driven culture. There are no silver bullets.

The rest of this post unpacks that paragraph.

Information retrieval from analytics systems becomes harder with the degree of customization

Assume a vanilla implementation of Open Web Analytics. Or Google Analytics. Or Adobe Analytics.

It’ll tell you a lot about a web system on its own.

The optimization objective that is at the core of the business will typically get specifically tagged, and undergo a greater degree of metadata collection than others.

That customization is going to make information retrieval from that system harder. Because you’re asking any given user to understand the strategy of the firm. Then getting them to that location in the instrument to observe that metric is a challenge.

And as the number of managers charged with moving various levers increase, each will have demands for different pieces of metadata enhancement, in different places. This compounds the complexity asked of the system.

Increasing complexity of the data model is an inherent property of data driven cultural progress.

That customization is frequently an expression of the values of a culture over time

The system of thought, conventions, roles, vocabulary, attitudes, behaviors, and routines are all parts of culture.

Cultures change. No culture can survive rapid inundation of new hires exceeding 60%. A culture may become encrusted in salt if it undergoes rapid evaporation of layoffs exceeding 60%. Managers are hired. Some are fired. Some may be promoted. Others may move on. Pedantic fights may occur. Slogans are crafted. Slogans are forgotten.

Each evolution may leave their fingerprints on an analytics system. Nancy may have been all about ‘engagement!’, and as a result, over ninety distinct ‘moments of truth’ touchpoints may have been coded. John may have been ‘all about conversuation!’, and as a result seven landing pages may have been tagged in a very specific way. There may have been that time that Jean tried to sell T-shirts for some reason, so there may be twenty ecommerce variables that were set up.

The comings and goings of people and priorities leave a deposition of customization on an analytical system over time. These do not occur at random. They’re caused directly by culture.

Social debt has a direct impact on the volume of technical debt accrued.

The inertia of the technical debt caused by early customization is greater than the inertia of a data driven culture

A feature, built by a clique, may persist in a technical infrastructure for months, years, even decades after they’ve long gone. There are many reasons for this, but the greatest may be the insidious technical debt caused by hidden customers of that data.

Let’s say, a new person coming into an environment examines a given implementation and sees a strange variable. Let’s say that it’s 10 units of data long, concatenated with a pipe (|). They ask a few people if it’s in use anywhere and everybody shrugs. They move to deprecate. But at the last hour, somebody in a department they didn’t know existed relies on that thing to populate a quarterly report. So it has to stay. Or worse, their report breaks and 80 days later, they attack the repayment.

Metadata tends to get a life of its own. And if it isn’t used by an unknown, it’s frequently kept around out of fear that it could be useful someday. Or they might need it later. Or because nobody feels as though it’s truly broken so they shouldn’t fix it. Or that the harm of keeping it (confusion, opportunity cost, misinterpretation) doesn’t rise to the level of the perceived potential upside of keeping it.

Once captured, it is very difficult to kill a piece of metadata. It becomes locked in.

There are no silver bullets

In general, the incremental confusion caused by the metadata **you** need to do your job isn’t a problem. The problem is all of the confusion caused by **others** adding their metadata **they** probably don’t really need to do theirs.

Everybody can agree, in principle, that 5,000 automated reports is a lot. Nobody can agree, in practice, that it should be their single automated report that should be cut.

Like in most systems, the benefits are localized and the externalities are socialized. Networks of knowledge become entangled knots of mess.

There are no quick wins. There is no listicle of the top 7 things you need to do to keep your metadata strategy tidy that you’ll find here.

It’s so easy to write ‘have a data strategy’. It’s so much harder to have grown up conversations about the real, hard, choices that have to be made. That’s what a strategy is. It’s choices amongst weighted alternatives.

If it’s to be a data commons, set the expectation that there will be a tragedy of the commons. Some cultures embrace this. It may also be very possible for a data driven culture to thrive with a highly heterogenous metadata structure. Either people access the data dictionary and thrive. Or they die. It’s called darWINian for a reason. That kind of a strategy leads to periodic resets. Mass extinctions. An entire system is ripped right out and a new one is put in. The point, often, isn’t that the new system isn’t really any better. The point is to reset the data model. The churned don’t have a seat a the table. As a result, their echo’s rarely make it into the next system. That’s one outcome.

If it is not to be a data commons, set the expectation that there isn’t to be a tragedy of the commons. Data governance means order. It means that hidden customers are systematically exposed. It means hoarding is checked. It means somebody is going to be the unpopular. It leads to greater sustainability.

You have choices.