A method is used to derive meaning from data.

Consider the dataset below.

 

Don’t get too excited. It’s data that I typed in. It is pseudo-random. It is data, in the form that it can be manipulated and explained. But it is not representation of anything that you’d find naturally in nature. It is a fake dataset.

There are five variables. Spend represents how much an analyst spent on a given campaign. Impressions are how many impressions were served through that campaign. Conversions is the number of purchases that were made. Revenue is how much money was booked against it. And medium is which channel was used to deliver the impressions and attributed through that campaign. This is a relatively simple data set. Suspend disbelief that such a glorious thing could ever exist. It does.

Now it’s time for some context.

The mediums correspond to display advertising, social media marketing, and search engine marketing. The very next sequence of syntax is for a frequency table of all five variables.

 

The table below lists summary statistics. These include mean, median, mode, standard deviation and percentiles. Across 15 campaigns, my average spend was 520 dollars and my average revenue was 9,936 dollars. These averages, of course, are a huge simplification of what happened. But that’s what that measure is designed to do. Mean, median and mode are measures of central tendency. The belief that the best way to explain what happened, to summarize it up into a single bullet point, is by way of the mean. Standard deviation and the percentiles are measures of dispersion.

I can tell, very quickly by glancing at just one line, that my spend was fairly consistent and my revenue in return was not.

My next step is to create two new variables – spendrange and revenuerange – in an attempt to tell a simplified story. What happened when I spent less than average? What happened when I spent more than average. The next section of code yields a crosstab. The dependent variable is revenue range, my independent variable is my spend range. This is a very simple model that I’m communicating here.

What you get looks like the table above. We’re concerned with the middle and bottom table. The middle table has what I spent along the top, and what I got along the side. The way to read the numbers inside the tables is:

“When I spent less than 519 dollars, 72.7% of the time I got less than 9936 dollars in return. When I spent more than 520 dollars, there’s a 50% chance I’d get more than 9936 dollars in return.”

You see what I did there.

The table below that contains my symmetric measures, btau and ctau. I’ll take the btau (the ‘b’ stands for ‘square Box’ in my mind), of .213. I disregard the Approx Significance figure of .434 because I’m dealing with such a small n. And what I think SPSS is telling me that, based off of 15 observations, I can’t make a very stable prediction about what 80,000 campaigns would look like. And that’s a fair assessment.

Next up is a regression. I have to take the ‘medium’ variable and bust it out into three separate variables, called ‘dummy’ variables. That is to say, I have to take a multiple nominal variable and turn it into dichotomous nominal variable. It’s required for the regression to work.

That next block of text runs a Graph and a Regression. I’m asking the machine to tell me if spend, display, social, search, and impressions are correlated to revenue. What I really want to know is if any of them are predictive.

The graph enables one to see all 36 relationships visually. And I’m not seeing much. The correlations table below shows the strength among the 36 tables, along with significance. It will tell you if you’re getting fooled by randomness.

And then, finally, the regression table itself:

The adjusted R Square is -.265. That is to say, the model doesn’t explain what’s going on in the data. The Coefficients table contains the relative predictiveness of each variable (B) the rate of error, and the significance of that variable. The model doesn’t work, so there’s no point.

All these methods are violent. They destroy a huge amount of information. And sometimes you can accidentally destroy the most salient points.