VARIABLE DIMENSION PLOT
Variable dimension bridge analysis is something you probably rarely saw in action possibly also because it is hard to implement in Excel. However, it is nothing new. When you say "the top sales driver was the Chinese market (dimension➡️country), the second direct sales (dimension➡️channel) to large accounts" (dimension➡️segment) you are doing variable dimension variance analysis: you are changing the dimensions of each result.
The app computes all possible combinations, ranks them according to user-defined parameters, and returns a set of facts that explain the change.
The dimension columns of every row vary both in number and in type, and are automatically selected by rank of importance. The app looks for the combinations that better explain the variance. In this example all row results have a single dimension.
Since the selected combinations are the most significant, this allows to explain significant positive and negative change with a limited number of results.
It is possible to change the ranking parameters, for example ask the app to give weight to % change instead of just to absolute value change.
If the standard number of requested results - five - is not enough to cover all the variance, a "residual" entry will appear. This is completely normal and does not indicate an error.
As in fix-dimension bridge analysis, every row is net of the rows above. However, since the dimensions change in number and type, after a few rows it can be difficult to keep track of what has been filtered and what not. By default the app returns five row results.
AUTOMATING THE ANALYSIS
What changed? — in sales, costs, margins, inventory, ROI, EBITDA,... - is a common question for any business 🏭.
Between this year and last year, this quarter and the previous quarter, this month and the previous month, this month the corresponding month of last year, this week and the corresponding week of last year (especially reserved for the brave of heart) or simply between actuals and budget.
The possibilities are endless.
If you understand what has changed, you are a step ahead in grasping why it changed and being able to react accordingly.
Enter multidimensional datasets.
A multidimensional dataset is a dataset with more than one dimension along which you can analyze your data. There is nothing wrong with multidimensional datasets. The world is multidimensional so you need to map it with multidimensional data.
On the other hand, the reason why your sales dashboard has 19 different filters is because somebody invented multidimensional datasets. Whoever invented slicing and dicing might not have sufficiently pondered the fact that, due to the curse of multidimensionality, you will never be able to adequately explore the myriad of routes offered by 19-filter dashboard.
It gets worse if, instead of playing with a Tableau dashboard, you are tasked at building the this-quarter-vs-last-quarter sales waterfall chart for your CEO’s presentation to the Board.
Say your dataset in mono-dimensional: you only have sales by market. It is trivial to map your 50M$ increase in revenues to the different markets.
Say you have — because you certainly have — also the sales by channel and by product line.
Which is the first item of your waterfall chart? To find out, you slice by market, then repeat by channel and again by product line. Whatever is largest will merit the honor of the first slot. It is “sales in Italy 🍕”.
Now the second item. To find it, you need to subtract “sales in Italy” from your dataset. Do the slice and dice thing again. See who wins. It is “sales in department stores” 🏬.
Somebody argues that the first slot more rightfully belongs to “sales in China 🐼”. The increase of “sales in Italy” is marginally higher than the increase of “sales in China”, but Italy, your home market, grew a lackluster 5% while China is up 100%.
Back to Excel and redo the whole thing.
In fact, after more slicing and dicing, it becomes clear that the boost of “sales in China” is due to a single product line, T-shirts 👕, while the disappointing performance of the Italian market is tied to the weakness of the department store 🏬 channel.
Given this evidence, it would help show “sales in China 🐼 of T-shirts 👕” as the first item.
Back to Excel and redo the whole thing.
After subtracting only “sales in China 🐼 of T-shirts 👕” from the first iteration, “department store” does not merit the second slot any more. Slice and dice and find another one.
After a few iterations you have explained 45M$ out of the 50M$ sales increase. You want to go home. “Other” is all you can answer if your boss asks. Let’s hope he won’t. 5M$ might actually be an error in your horribly complex Excel sheet.
Real-life multidimensional datasets have more than three dimensions, and simply cannot be processed in this manner.
There is a better, less painful, and more insightful, way to do this. It goes like this:
- Recognize that the problem has many solutions and that it requires a fast-iteration approach
- Set scenario (“I want to see the combinations that have changed the most in absolute value", or ”I want to see the combinations that have changed the most in terms of % change",...)
- Let the computer calculate all the combinations, do the subtractions, come up with a set of consistent results that fully cover the change and are computationally error free
- Change scenario and/or fine-tune parameters until the output is satisfactory.
Variable dimension bridge analysis returns many result combinations, which are all algorithmically correct, but might be more or less insightful, and therefore useful to you.
To increase the probability of surfacing the more insightful scenarios 'Mparanza uses heuristics: it offers a certain number of "scenarios". Each "scenario" is a set parameter settings, designed to maximize the probability that the result will have certain characteristics. This can be simpler than adjusting parameters.
The default scenario is the "choose the largest combinations" scenario that essentially returns the combinations in order of variance size.
In default mode, given the choice of a scenario, the app will select the top ranked combination as the top result, net the value of that combination from all remaining combinations, and run the loop again, to return the second result, and so on.
Without changing parameters, you can, for instance ask the app, to skip the first result, and select the second top ranked combination as the top result. This will result in a different set of results, without changing parameters.
You can fine tune the result, by modifying the single parameters.
MANAGING ALTERNATIVE RESULTS
The interesting thing about variable dimension bridge analysis is that it is a problem that has multiple solutions. Multidimensional datasets equals multiple solutions which, on the bright side 👍, equals multiple ways of quickly looking into the business and hopefully getting useful insights 💡.
The only way to avoid having multiple solutions is not to use multidimensional datasets in your bridge analysis, which is a pity because you loose a lot of insights. If you do the work in Excel, you won't avoid the problem of multiple solutions, but you might wisely ignore it, given the time it takes to do an iteration 🙂.
With 'Mparanza - not necessarily with Excel 😉 - all these solutions will be mathematically correct. Hopefully, a few will be insightful and actionable. How do you find these precious gems 💎?
Short answer - not with machine learning. By leveraging on human-machine👨💼🤖 collaboration. Machine learning and AI are supercool. However as the term implies, the machine needs to "learn".
In order for the computer to learn to recognize a cat 🐱, (i) you need to know what a cat looks like and (ii) you need to show the computer lots of cat pictures 😼😹🙀😾😿😻😺😸😽, and tell it these are all cats, so if it finds something similar, it can tag it as a cat 🐈.
In bridge analysis, the cat we are trying to recognize is an insightful report. It is difficult to prepare a training dataset of insightful reports so the machine can learn to recognize them. Insightful is in the eye of the beholder. We might get there one day. For the moment, 'Mparanza's approach is to make it easy and quick to iteratively tweak the report until we finally get what we want.
We simply look at the file and try to automatically adjust the parameters. Hopefully we get something insightful in the first run. If not, you will need to change the parameters manually and iterate.
VARIABLE DIMENSION VARIANCE CHART
You need to click on the submit button below to actually run the report.
The report shows the dimension columns, the variance of each row, and, on hover, the cumulative sum of the variance.
Please note that the value of each row is net of the values of the rows above it.
For instance the first row of the example above has "steel-gold" in the Material column and nothing in the other columns (see table below the chart).
This means that the values of all subsequent volume & mix rows are net of steel-gold product sales. The volume & mix variance of steel-gold product sales is fully explained in the first row.
You can download both chart and table images by clicking on the camera icon.
You can download the table in csv format by clicking on the ➕ symbol on the right above the chart.
ENABLE PLOT COMMENTS
It is possible to add comments to the chart, in the style suggested by the IBCS standard.
Enable comments by setting the "enable plot comments" widget to True.
You can find the "enable plot comments" checkbox in the Run widget expander.
Hit the Submit button.
A text box will appear above each chart.
Add your comment, one chart at a time.
Once you have finished writing the comment for a chart, click on the 🚀 Submit.
Your comment will appear above the chart title
The downloaded image of the chart contains the comment.
Below an example of an output report...
...and of the drilldown report of the first row (complication = "Smart Watch") of that report.
The drilldown report of a given row can help better understand "what is behind" this result row. In this case, Smart Watch sales did particularly well in Steel-Gold and in Bronze Material.
By default, the app will try to generate a drilldown report of each of the five result rows.
It might take time for the app to process all result row drilldowns. Set the "Drilldown all results" widget to False if you do not want to see drilldown reports or you want to choose which report rows to drilldown.
If the app cannot return the drilldown detail of a given row it will display a warning message.
In this case for example given the following report...
...it is not possible to generate a drilldown report of row five, since that result is already fully detailed on all the three dimensions.
The drilldown report will breakdown a given report row result into a set of sub-components. In this example, the second result above (brand="Ward"), totaling -4.49M, is broken down in four elements.
Open the expander below the "drilldown row" widget of each report row result to fine-tune the drilldown parameters.
For instance, if you do not find the drilldown report above useful, you can try returning an alternative result...
...and you will get a different, alternative but consistent (same -4.49M total), drilldown result.
For example, with three result rows instead of four.
If you do not find this new drilldown detail useful, you can try returning another alternative report...
...that, in this example, provide details of Channel and Customer and still totals (in this case after a -129 balance) to -4.49M.
MOVE DRILLDOWN RESULT TO MAIN REPORT
A "select drilldown rows to move to main report" widget will appear to the right of each drilldown widget.
This allows you to move a result of your drilldown report back into (a modified version) of your main report.
For instance, this is your main report...
...and this is the drilldown report of the first row.
You want to detail your first "material = steel-gold", 6.55M, result row by replacing this row result with - say - the first three rows of the drilldown. Select those three rows (with values of 5.29M, - 2.49M, 2.38M respectively) in the widget and hit submit.
You get a new report. The first 6.55M row has indeed been substituted with the three (5.29M, - 2.49M, 2.38M) results from the drilldown report. The results below might have changed since the sum of these three rows is different from the sum of the original first result.
If you decide to replace the first result row with all 5 of its drilldown components...
..the results below the inserted rows will better - but not necessarily fully - correspond with the original output.
Below an example of an output
Setting the "Let me fine-tune parameters" widget to True allows you to modify the parameters used to rank combinations, and can help return a more meaningful set of results.
Below the list of parameters with a short explanation.
MAX NUMBER OF NODES PER RESULT
For simplicity's sake, you might only want to see results with up to a certain number of nodes/dimensions.
PARAMETER SETTING OPTIONS
The app has four "standard" parameter setting "scenarios" that might returns four different sets of results.
The available parameter settings are:
Choose largest combination - returns a set of combinations chosen based on their variance absolute value, with no or little adjustment for other elements such as % of increase, number of nodes, unique number of elements.
Limit first result size - limits the maximum size of the top result.
Choose combinations with more nodes - privileges combinations with higher number of nodes
Choose higher growth combinations - privileges combinations with higher rate of positive or negative revenue growth.
PARAMETER AGGREGATION ALGORITHM
The app ranks the different combinations and selects the top results, giving appropriate "weights" to the different parameters based on the chosen scenario. This requires that the values of the different parameters be normalized, so they can be aggregated and ranked together.
There are different possible ways - different algorithms - to perform this normalization, and each algorithm might return a different ranking order and therefore different results. The app uses the simple min-max normalization as the default. If you are unsatisfied with the results you can try another normalization method. Some normalization method normalize the variance amount and percentage change differently (in fact giving them more weight) from the other parameters.
NUMBER OF DESIRED RESULTS
Simply sets the number of rows in the returned result dataset. Defaults to 5 and can me brought up to 10. If the cumulative variance converges before the desired number of results, the tool will naturally will stop before.
MAX % OF TOTAL VARIANCE PER RESULT
Sets the maximum "size" of each item in the returned result set. If one item "eats up" a large part of the variance and therefore results in a skewed set of returned rows, you can exclude that item from the accepted results.
MIN % OF TOTAL VARIANCE PER RESULT
Sets the minimum "size" of each item in the returned result set. Use to exclude non relevant items, for instance when the parameters are set to give weight to variance increase in percent.
VARIANCE AMOUNT WEIGHT
If set to 1, the code will weigh the potential result items in proportion to the absolute value of their variance. Otherwise, weight will also be given to the percentage level of change of the item. Useful to highlight items that have changed a lot, even if the absolute value of their variance is modest.
NUMBER OF NODES WEIGHT
If set to 0, the code will not give weight to the number of nodes of a given potential result row. When set to a value greater than zero, more weight will be given to results with more nodes, even if their variance value is the same.
For instance the "volume variance pants Italy" combination has two nodes(country, product type) while the "volume variance red shirts Milan" combination has four nodes (country, city, product type, color) because Milan-city is a hierarchical child of Italy-country. If the parameter is set to more than zero the four-node result will be ranked higher than to the two-node result, even if the absolute value of the two variances is the same.
Useful to highlight items with high number of nodes.
UNIQUE NUMBER OF NODES WEIGHT
If set to 0, the code will not give weight to the number of total items contained in the columns of a given potential result row. Otherwise results "chosen" out of columns with more items will be weighted more.
The system counts the unique items in each dimension column, and calculates the total items of each combination. For instance, if we operate in a total of 5 cities, in 2 countries, where we sell a total of 10 different products in 4 different colors, the combination "volume variance pants Italy" will have a total items value of 12 (10 for the product dimension plus 2 for the country dimension).
The combination "volume variance red shirts Milan" will have a total items value of 21 (10 for the product dimension plus 4 for the color dimension, plus 2 for the country dimension, plus 5 for the city dimension). In practice, in this case, combinations that contain a specific product (of which there are 10) will be weighted more than combinations that contain a specific city (of which there are 5) because they are assumed to be more interesting.
In "variable dimension" mode, the app returns a set of results among many more possible options. This set of "chosen" results might not be the ones you expect, or that are meaningful to you. You can modify the results by changing the parameters.
Here is a set of troubleshooting tips you might find interesting.
DROP NON RELEVANT DIMENSION FROM RESULTS
In this example Ward Kermit, a buyer, seems to be the principal "cause" of the loss of sales. In fact the "Buyer" column is probably not a good "explainer" of the change of sales. In this specific case, Mr. Ward simply left the company at the beginning of the year, but the Plan was not corrected to take that into account. So Mr. Ward appears to explain a large part of the difference between Plan and Actual.
Other than correcting the Plan and substituting the new buyer's name to Mr. Ward, the best course of action here is to simply drop the non-significant column from the results.
This will result in a markedly different, and hopefully more insightful set of results that do not have the "Buyer" column.
SHOW DIMENSION IN EVERY ROW
Sometimes, for clarity, we need a given dimension to show up in every result row.
In this same example, it might be interesting to find out in which Store Type the Womens category is having trouble.
This widget forces a given dimension to appear in every row.
This result is returned.
If it is not, the best option is to play along with the parameters to tweak the result. For instance, we can try returning an alternative result row with this widget...
...and see if we get something better.
In this case we get result rows with many dimension, but that have a low variance value. As a consequence the balance value after five results is very high.
Please note that all these different results are computationally correct and show different sets of facts. Setting the widget to a different value...
...will return yet a different result, in this case with similar characteristics.
RESULTS DO NOT CONVERGE TO TOTAL VARIANCE
Sometimes you might get something like this. The result items are too small and therefore do not converge to the total.
The app is choosing too small combinations and therefore it is converging too slowly to the total variance.
One simple fix could be to increase the minimum accepted result size value.
This will partially solve the problem and return a set of larger results that converge more quickly.
Since the issue here was that the app is returning lots of low value results, and not giving enough weight to the amount, another approach could be to change the parameter aggregation algorithm and choose an algorithm that "over-weighs" variance amount.
Now the app returns this set of results which is definitively more weighted on their amount.
DEALING WITH DIMENSIONALITY
What if my dataset has many dimensions, say more than three or four? Does multidimensional variance analysis still provide useful insights once the number of dimensions starts becoming a challenge for our - human - capacity of understanding?
This is the core challenge of 'Mparanza. 'Mparanza is designed to manage multidimensional datasets, which in itself poses non-trivial technical challenges. However, once these challenges are met, we are left with a critical question: will the - inevitably complex - result of the computation be meaningful for the human user?
This is especially true when the dimensions are independent. When the columns are tied with hierarchical relationships (for instance city-region-country, or product-brand-category) the processing algorithm is relatively straightforward: the number of possible combinations between dimensions remains limited. More importantly, human intuition has a much easier time finding grips that lead to understanding.
If dimensions are independent, the number of combinations explodes. The app takes more time to run. It also becomes a more difficult for the human user to understand the interrelation between dimensions.
An example of a "tough" dataset is "30 Rewards". This dataset is not especially big, only 32,000 rows, but it has ten independent columns and returns 240k combinations. It takes more time to run.
Leave the default options, hit submit, wait a little while.
The tool returns 5 out 240k combinations that "explain" the change. You need to understand that this particular set of results is but one of hundreds, if not thousands, of other combinations of results that "explain" the change in revenues. 'Mparanza allows you to quickly search for and analyze the most insightful sets of combinations. Something that would be practically impossible to do with Excel.
This particular combinations enlightens us to the fact that most of the growth is due to Online sales, that have growth 22% year over year.
To understand more, we can drilldown on the first result row. We find out that it is mostly "non-active", "cristal" customers, that are driving revenues and growing at over 30% per year. In this second report, we are looking at the interplay of 6 different dimensions (Market, Active, Category, Payment, Channel, Lever) that explain a change of 310M.
This second report might or might not deliver the actionable insights needed the a specific situation.
The tool offers a variety of tricks and functionalities to help pin down the "right" set of combinations.