Skip to content

MAX DATASET SIZE

Each file you upload can have a maximum size of 200MB.

Having many different dimension columns might unacceptably degrade performance, leading to a poor user experience. Additional dimension columns might add little or no value if they are not correlated to the variance value.

CORRELATION TABLE

The app tries to estimate the correlation the different dimension columns to the variance value.

If necessary, the app drops the least correlated columns. In the example below, the columns with the least correlation are dropped from the analysis.

To see the correlation table and check if columns have been dropped click on ➕ Dataset info.

image-20210312123418387

CORRELATION CALCULATION

All items in each column are grouped in two groups, so that the sum of the Amount column for the two groups is as similar as possible. The correlation is then calculated between the variance metric and the 0 vs 1 value of each newly created "group" column.

The idea is that for a given column, the higher the correlation, the higher should be the impact on variance of any item of that column belonging to one or the other group.

This calculation should give a reasonably good idea of how much different values in that particular column impact on variance. Low correlations should indicate cases in which the variance is simply proportional to the size of a given item. Cases such as China changed the most simply because it is the largest market, but all markets changed proportionately.

image-20210425192502086

For instance in the case above, by Discount Band all items are moving basically "in step", indicating low correlation - being part of one Discount Band or of another is a bad predictor of variance size.

visualization

Instead there seems to be more going on by Segment, where the top two Segments have a different trend compared to the bottom two.

visualization (1)