Skip to content

COLUMN DETECTION

How does 'Mparanza recognize that a column is a dimension column such as "country", "channel", "product"?

Because it is not numeric. You can disable this behavior if you want - maybe your dimension columns along which you want to analyze your dataset are the accounting codes. 'Mparanza will do its best to detect them anyway. At the bottom of the printout you have the list of the dimension columns that were detected. If two dimension columns are "equivalent" (every value in one column corresponds to a given value of the second column) 'Mparanza will discard one of the two columns.

image-20210426120817493
METRIC AND DRIVER COLUMN DETECTION

To detect the other columns - the monetary column that contains the amounts, the period column that contains the time periods, and the unit column that contains the volumes - 'Mparanza uses the same approach a human would use. It looks at the column name.

The advantage of this approach is that it is a lot quicker and more efficient. Once you have named your column in a way that 'Mparanza understands, it will forever load the dataset automatically, without asking you any more questions or taking any more of your time.

image-20210422191430677

The code tries to recognize the metric and driver columns (cost/revenue ,quantity/volume, discount, COGS period, date , category weighted distribution,...) by searching for stems (for instance if "amount" is part of a column name, it will assume that the column is the cost/revenue column ). If it does not find the expected stems, the code will ask you to rename the columns in a way it understands.

For instance the monetary column name must contain a stem that 'Mparanza recognizes as associated to a monetary column field. The list of these stems is printed if you select "Run"> "Data Profiling". If it finds a column named with one of these stems it will tell you that it has tagged that particular column as monetary value column.

It works the same way with the Unit/Volumes columns....

image-20210312113213014

...the Date/Period columns and with the other metric columns.

image-20210312113245070

This means that if your column is not named with one of the stems in our list, it will not be recognized, unless you rename it in your dataset. 'Mparanza will print a warning message.

image-20210312113314111

If your dataset comes from a SQL query, changing the query just to make 'Mparanza happy is not really an option and renaming the column manually every time is a drag. We know that our list is far from complete. If your favorite column name is not in our list, please send us a message and we will happily include it.

EQUIVALENT COLUMN DETECTION

If the code detects that two columns are equivalent (for instance customer name and customer code) it will drop one of the columns, the second one. If you want to see the second column instead, you should reverse the order of the two equivalent columns in your dataset.

Please check if the equivalent columns in your dataset have been identified and dropped.

If not, the two columns probably are not perfectly equivalent. You can check this with the profiling report. Please correct and re-submit. Performance can degrade very significantly if the code has to process two "identical but not really identical" columns with high cardinality.