MAX DATASET SIZE
Each file you upload can have a maximum size of 200MB.
Having many different dimension columns might unacceptably degrade performance, leading to a poor user experience. Additional dimension columns might add little or no value if they are not correlated to the variance value.
DROPPING COLUMNS UNCORRELATED TO VARIANCE
In multiple dimension variance calculations, the app tries to estimate the correlation the different dimension columns to the variance value.
If necessary, the app drops the least correlated columns. In the example below, the columns with the least correlation are dropped from the analysis.
The columns are dropped from the whole database, so they will not be plotted in any graph. To avoid this, simply set the 🏃🏻♀️Run widget to Fix dimension bridge.
To see the correlation table and check if columns have been dropped click on ➕ Dataset info.
All items in each column are grouped in two groups, so that the sum of the Amount column for the two groups is as similar as possible. The correlation is then calculated between the variance metric and the 0 vs 1 value of each newly created "group" column.
The idea is that for a given column, the higher the correlation, the higher should be the impact on variance of any item of that column belonging to one or the other group.
This calculation should give a reasonably good idea of how much different values in that particular column impact on variance. Low correlations should indicate cases in which the variance is simply proportional to the size of a given item. Cases such as China changed the most simply because it is the largest market, but all markets changed proportionately.
For instance in the case above, by Discount Band all items are moving basically "in step", indicating low correlation - being part of one Discount Band or of another is a bad predictor of variance size.
Instead there seems to be more going on by Segment, where the top two Segments have a different trend compared to the bottom two.