How do ranking algorithms work?

At the highest level:

  • A machine accepts a series of independent variables.
  • A machine interprets those variables.
  • A machine produces an output that is, ideally, predictive of a dependent variable.

 The usefulness of an algorithm is in just how predictive the output is of a dependent variable.

For instance:

  • The usefulness of the Google Search algorithm depends on how relevant the results are in relation to the query.
  • The usefulness of the Facebook GraphRank algorithm depends on how relevant the results in the news feed are in relationship to the user.
  • The usefulness of the Netflix algorithm(s) depends on how relevant and divergent the recommendations are in relationship to the household.

All three companies use algorithms to rank content and display them to you. Your behavior, in one way or another or in one context or another, is an optimization objective.

The sophistication of each machine varies on the context:

  • Google uses vector math on sparse matrices to calculate a portion of its ranking.
  • Facebook more than likely uses an alternative form of vector math on more uniform matrices to calculate a portion of its ranking.
  • Netflix uses a powerful form of supervised and unsupervised machine learning to generate its recommendations.

These algorithms work by taking very large data sets, sorting them, and reducing them into very useable segments. They ought to be tuned based on a single real number that is representative of the desired dependent variable they’re optimizing for.

How many Independent Variables are enough?

There’s a tension between those who argue ‘those who have the most cases win’ versus those who argue ‘those who have the most independent variables win’. Neither side is totally right of course, it all comes down to context. Thanks to advances in machine learning, data scientists / marketing scientists can take as few as three independent variables and turn them into several hundred independent variables. Not all of them will be predictive. And worse, they lose their independence.

In general, the relationship between number of independent variables and just how complex the equation can be managed. This is done through a few very clever methods to manage both the relative complexity of equations (and the problems those cause in predicting the future) and the relative simplicity of some equations (and the problems those cause in predicting the future).

You have enough independent variables when the ranking algorithm is producing usable output. But you’re probably never, ever, really done.

Ranking Algorithms in Analytics and Marketing Science

Ranking algorithms could have huge applications in analytics and the marketing sciences. We have a problem in that there’s just too much data. Ranking algorithms solve for that.

Such algorithms should work the same way that all ranking algorithms work.

  • Input independent variables.
  • A machine interprets those variables.
  • An output that is predictive of an outcome is produced.

That doesn’t mean that we generate scores for the sake of scores.

We generate scores for the sake of some predicting some exterior dependent variable. And, for that output to be valuable, it has to predicative and accurate to some degree.

Predicting better marketing outcomes using ranking algorithms might be one of the most obvious applications.

***

I’m Christopher Berry.
I tweet about analytics @cjpberry
I write at christopherberry.ca