VARIABLE DIMENSION VARIANCE

What changed? — in sales, costs, margins, inventory, ROI, EBITDA,... - is a common question for any business 🏭.

Between this year and last year, this quarter and the previous quarter, this month and the previous month, this month the corresponding month of last year, this week and the corresponding week of last year (especially reserved for the brave of heart) or simply between actuals and budget.

The possibilities are endless.

If you understand what has changed, you are a step ahead in grasping why it changed and being able to react accordingly.

Enter multidimensional datasets.

A multidimensional dataset is a dataset with more than one dimension along which you can analyze your data. There is nothing wrong with multidimensional datasets. The world is multidimensional so you need to map it with multidimensional data.

On the other hand, the reason why your sales dashboard has 19 different filters is because somebody invented multidimensional datasets. Whoever invented slicing and dicing might not have sufficiently pondered the fact that, due to the curse of multidimensionality, you will never be able to adequately explore the myriad of routes offered by 19-filter dashboard.

It gets worse if, instead of playing with a Tableau dashboard, you are tasked at building the this-quarter-vs-last-quarter sales waterfall chart for your CEO’s presentation to the Board.

Say your dataset in mono-dimensional: you only have sales by market. It is trivial to map your 50M$ increase in revenues to the different markets.

Say you have — because you certainly have — also the sales by channel and by product line.

Which is the first item of your waterfall chart? To find out, you slice by market, then repeat by channel and again by product line. Whatever is largest will merit the honor of the first slot. It is “sales in Italy 🍕”.

Now the second item. To find it, you need to subtract “sales in Italy” from your dataset. Do the slice and dice thing again. See who wins. It is “sales in department stores” 🏬.

Somebody argues that the first slot more rightfully belongs to “sales in China 🐼”. The increase of “sales in Italy” is marginally higher than the increase of “sales in China”, but Italy, your home market, grew a lackluster 5% while China is up 100%.

Back to Excel and redo the whole thing.

In fact, after more slicing and dicing, it becomes clear that the boost of “sales in China” is due to a single product line, T-shirts 👕, while the disappointing performance of the Italian market is tied to the weakness of the department store 🏬 channel.

Given this evidence, it would help show “sales in China 🐼 of T-shirts 👕” as the first item.

Back to Excel and redo the whole thing.

After subtracting only “sales in China 🐼 of T-shirts 👕” from the first iteration, “department store” does not merit the second slot any more. Slice and dice and find another one.

After a few iterations you have explained 45M$ out of the 50M$ sales increase. You want to go home. “Other” is all you can answer if your boss asks. Let’s hope he won’t. 5M$ might actually be an error in your horribly complex Excel sheet.

Real-life multidimensional datasets have more than three dimensions, and simply cannot be processed in this manner.

There is a better, less painful, and more insightful, way to do this. It goes like this:

  • Recognize that the problem has many solutions and that it requires a fast-iteration approach
  • Set scenario (“I want to see the combinations that have changed the most in absolute value", or ”I want to see the combinations that have changed the most in terms of % change",...)
  • Let the computer calculate all the combinations, do the subtractions, come up with a set of consistent results that fully cover the change and are computationally error free
  • Change scenario and/or fine-tune parameters until the output is satisfactory.